Intro
Machine learning is becoming an ever more important field. It's used pretty much anywhere, by small scale businesses, theoretical physicists and in health care. The possibilities are endless!
But hang on there, how does machine learning actually work? Well, machine learning is, just as many other cool technologies, powered by math! And in particular, you'll need this concept of gradient descent. Basically, it's a method allowing the machine to become less and less wrong. So without math, machine learning would have been impossible!
Concept
The gradient is basically a vector with all the partial derivatives.
But what's cool about the gradient isn't that it's more information-rich than an ordinary partial derivative. I mean, who cares whether you're able to compute another partial derivative? So what?
Well, the gradient always points in the direction of steepest ascent, and it's perpendicular to contour lines. These properties make the gradient pretty awesome.
Math
The gradient of a scalar function of several variables is the vector
The two most important properties of the gradient are
points towards the direction where increases the most.
is perpendicular to the level curves of .
For example, the gradient to the function is
At the point , the function increases the most if you walk in the direction of
Del operator
This symbol is called nabla, and it is used to denote one of the most important constructs in the field of calculus, the del operator.
We have previously been exposed to the simple differential operator, which acts on a single variable function to produce its derivative :
Then, we extended the concept of the derivative to functions of several variables, through partial derivatives.
As it turns out, these are both cases of applying the del operator.
The del operator is a vector of partial derivative operators, and its number of components is decided by how many variables the function it operates on has
Let's dig a bit deeper to see how all this tie together.
The del operator
The del operator in the -dimensional coordinate system , with variables , is defined as:
Just like a regular vector times a scalar multiplies each component separately, each component of this vector of partial derivative operators act on a scalar function one-by-one.
Applying the del operator to a function in :
results in only one component, and becomes nothing but the simple differential operator:
Similarly, applying the del operator to a function in :
gives us the partial derivatives, with respect to and respectively, as the two components of a vector
Further, we can operate on a function in dimensions with del, to obtain a vector of partial derivatives:
These examples of acting on scalar functions with the del operator yield special types of vectors called gradients.
The operator can be applied to vector valued functions as well though, in two different ways, to produce either divergence or curl. Here, it is sometimes beneficial to represent the del operator in polar coordinates rather than in Cartesian ones as above.
Gradients, divergence, and curl will be studied in detail in future lecture notes, so stay tuned.
Gradients
There's a snow storm, and you're out climbing the Matterhorn.
As you're climbing the Matterhorn, the famous 'Toblerone mountain', there's suddenly a blizzard. You can't see anything. If you walk in the direction of the so-called gradient, you'll gain in altitude the fastest.
If you package all partial derivatives into one single vector, you get the gradient. The gradient is written , and so
The gradient lies in the xy-plane, and it points in the direction of steepest ascent. It's perpendicular to contour lines, as shown in the pic to the right.
If you fancy, you could create a vector field with the gradients in each point. In such a gradient field, the magnitude corresponds to the length of the gradient. Quite commonsensical, right?
Example
Find the gradient of the following function
in the point
Then you'd get something along the lines (pun intended!) of:
Directional derivatives
Rates of change in any direction
The partial derivatives of a function are the derivatives of in the and directions. We have often denoted these directions by the unit vectors and .
Now, we can actually take the derivative in any direction . We call the rate of change in the direction a directional derivative.
To be able to compute this derivative, we need to have . We also need to be differentiable at the point of interest.
The directional derivative tells us how quickly the function changes in the given direction
We have already seen one example of directional derivatives: partial derivatives. They make up the gradient which can have any direction. However, it will always be parallel to the direction of most rapid growth of the function.
If we have a gradient, throwing out the component by setting it to gives us the directional derivative in the direction. This is equivalent to projecting it onto the -axis, as is perpendicular to .
Likewise, the directional derivative in any other direction is found by projecting the gradient onto the unit vector pointing in that direction. Concretely, we compute it with the scalar product:
The directional derivative in the direction of at is denoted .
The smallest directional derivative
The gradient is the directional derivative with the greatest absolute value. The other extreme is the directional derivative in the direction of the level curves at the point. If is a vector parallel to the level curve at , then
Why? Well, the level curves are the curves along which the function stays constant. This is equivalent to saying its derivative is zero in that direction.
Rate of change in scalar fields
We can regard the function as a scalar field, with the value of the field at each point. Then,
gives the magnitude of the rate of change in the scalar field, in the direction of .
Example: walking down a mountain
Imagine you are walking down a mountain towards your car. Your speed is in the -plane. The shape of the mountain follows the function , where:
Your position is and the parking place position is at , and you walk a straight path between them. Calculate the rate of change in the height you experience per second, as you start your walk.
The key to solving this problem, is to note that we are searching for the rate of change per second. This implies that the speed that we are travelling matter for the end result.
The second thing to note is that the directional derivative gives us a measure of how the height of the mountain vary as we move in the -plane. Thus, we find our change in height per second by multiplying our directional derivative with our speed:
We start by calculating the unit directional vector of our path:
Next, we calculate the gradient of the scalar field:
Then, we calculate the directional derivative at the point (1, 1):
Finally, we calculate the rate of change in height per second. It turns out to be: