Hello, and welcome to the Maths 2 component
of the online BSc Program on Data Science and Programming. This video is about the direction
of steepest ascent and descent. So, the ideas in this video will be based on what we have
studied about gradients and directional derivatives. So, let us start with this question about
how to trace water flowing down the hill. So, maybe we will do this with an example.
So, for an electrification project, someone named Mohan Bhargava is trying to understand
how water flows down a particular hill, the name may be somewhat familiar. So how does
water flow downhill? So, let us ask this question, how does water flow downhill? So, if you think about how it flows downhill
it always moves in the direction where the altitude decreases most rapidly. So, if you
have a rock face, if you have something like this, it is going to flow down here because
this is most rapid. So, just as an example here is the, here is a cross section of the
Deccan Plateau around Bangalore and Chennai, and so you can see Bangalore here in the middle,
you can see Chennai here at very close to sea level on the right and there is the western
ghats on the left, and a little bit of inclination between Bangalore and then a steep downfall
towards Chennai. So, just as an example, if you have water,
which is somewhere over here it is going to flow down here. If you have water on this
side, it will flow down on this side. And let us say if you have water somewhere here
it will flow down here, if you have water here it will flow down here, if you are here
it flows towards Chennai and so on. So, I think it is clear what we mean by it will
move in the direction where the altitude decreases most rapidly. Now, then, of course, one question. So, what
happens if you have water somewhere over here? Somewhere exactly at the top. And there it
is actually kind of stable. So, if you have gone to mountain tops you might often find
that on the top of the mountain it is often like a plateau and there is, often pools of
water over there. So, that does not happen. When you have, when you are on the edges.
So, which say something about the rate of change, if you think of it carefully about
what is happening to the altitude. This picture is just to give a general idea of what will
happen to water. Let us come back to the example of our friend Mohan Bhargava. So, what Mohan Bhargava does is, he models
the hill similar to the previous picture using a function h of x, where h is the altitude.
And then he calculates the derivative h prime x and computes at which points it is negative,
positive and 0 and that is exactly what we meant when we said that it moves in the direction
in which the altitude decreases most rapidly. So, the decrease in the altitude will be captured
by the rate of change of the altitude. And so, that means h prime x is going to tell
you what is happening at the point x or to the altitude. And so, if the derivative is
negative that means the water flows to the right from that point. So, if we say it is
negative that means the the the the tangent line is like this. And if the tangent line
is like this, it flows down here, which is to the right. And similarly, if the derivative is positive
then the tangent line is like this. So, it flows down here, that is to the left, and
if the derivative is 0, then that means the tangent line is like this, and that means
the water will remain stationary. And that is exactly what the phenomenon that I was
talking about, right at the tip of the mountain at the top. So, this is what Mohan Bhargava
tries to do, he tries to model the altitude by this function h of x draws a picture similar
to what we had before, although, not for the entire cross section of India, but for that
specific hill, and then then based on that he tries to understand how water is flowing
down. So, Mohan s friend Gita points out that a
one-dimensional model is probably not going to help because the hill is two dimensional,
so it has, it is like this and so a one-dimensional picture is not going to help us because there
are more than right and left, in real life there are on the ground. That is there are
many directions, not just right and left. So, she uses a two-dimensional function h
of x comma y to model the altitude h. And here is an example of such a thing. So, you
can see something like a hill, and this is the graph of a function and suitable exponential.
And so, what she comes up with is a function of this type h of x, y the actual hills have
more contours and she models it using that function h of x, y. And based on that she
tries to understand how the water is going to flow down. So, Mohan asked her, how she will find out
the direction in which the water flows, because now there is more than two directions. So,
if you had only a function of one variable h of x then computing the derivative told
you in what direction the water is going to move, but now you have several directions.
And how do I know now based on this function how the water is going to flow down? So, Gita says the same principle that the
water will flow down along the steepest slope, which is when the altitude decreases fastest.
So again, the direction in which the altitude decreases fastest. And now, how do we compute
this? So now, rather than saying that we should take h prime, we know that if we want to know
the rate of change in a particular direction, we have something called the directional derivative. So, she says that, to compute it, we have
to find the direction in which the function h decreases fastest, or equivalently, the
rate of decrease of h of rate of decrease of h, is fastest. So, this is the same as
finding the vector u in which the directional derivative hu is largest in absolute value
amongst those for which it has negative sign. So, hu should have negative sign, because
we want to decrease, we wanted to decrease h has to decrease fastest, so hu should have
negative sign, but amongst those with negative sign, those directions with negative sign,
if you take the absolute value, it is the largest. So, in terms of the real line, this will be
the value at which hu is smallest in which we include the negative side, so hu must be
smallest, that is what we are saying. So that is u such that hu is less than or equal to
hv for all v in R2, so that is the smallest value and we also demand that hu is negative,
because if it is smallest and positive then that is not going to be of much help. So,
this is what we want. So, Gita has now given a different way of doing the same thing, and
a more realistic way of doing the same thing. So, now we come to the main point of this
video, which is, how do we know that direction? So, now we will generalize this. So, suppose
we have a function of n variables and it is defined on a domain D in Rn, containing some
open ball around the point a tilde, so we will give a general answer and then we will
come back and answer it in this specific question of what direction will the water flow in. So, we want to know with this hypothesis in
what direction is the directional derivative minimized? So, suppose gradient f exists and
is continuous on some open ball around the point a tilde. So once this happens, we have
this wonderful theorem that tells us that if you want to compute fu, then the way to
do that is that fu is equal to gradient f at that point a tilde dot with u. So, this
is equal to the norm of gradient f at a tilde times the norm at u times cosine of theta.
Where what is theta? So, where theta is the angle between the gradient
at a tilde and u. We have two vectors, so we have seen this in the linear algebra part
that if you have the dot product then that is norm of the first vector times norm of
the second vector times cosine of theta, where theta is the angle between those two vectors.
Since it is a unit vector we know that norm of u is 1, so this is the same as gradient
of f at a tilde times cosine of theta. Now, this function f is fixed, so the gradient
is fixed and the gradient at a tilde in particular is fixed and so its norm is fixed. So, as you vary across the different directions
what changes is theta. So, this is going to change depending on what is theta. So, as
theta changes fu is going to change. So as u changes theta changes, and based on that
fu is going to change. So, where is this going to be maximized and where is this going to
be minimized? So, now that depends on the values of cosine
theta, and cosine theta we understand very, very, very well. So, cosine theta is smallest,
so cosine theta is minimized when theta is equal to pi. So, in other words, so if you
think of what that means, that means the angle between u and the gradient vector is pi that
means they are in opposite directions. If gradient is pointing here then you must be
pointing here. So that is u is pointing in the direction
opposite to gradient f at a tilde. So, this minimum value therefore, the minimum
value of fu is attained when u is equal to minus gradient of f at a tilde by norm, because
we always want a factor of u has to be of norm 1. So, u is in the opposite direction
means u is minus of the minus of gradient it is in that direction, and then to scale
it you have to divide by the norm to make it norm 1. So, the minimum value of fu is attained when
u is this and is equal to. So, now you can substitute u to be this. So, if you do that
or you can just say that cosine theta is minus of this so it equals minus of norm of gradient
of f at a tilde. So, the upshot of this is that in general if you have the directional
derivative, the direction in which it is minimized when your function happens to have this property
that gradient f exists and is continuous on some ball around a tilde is in the opposite
direction to gradient f at a tilde. And so, the unit vector you have to that appears
there is minus gradient f because that is the opposite direction divided by its norm,
because it has to be norm 1. And when you substitute u to be that, which is the same
as taking cosine theta is minus 1 in this expression we will get minus of norm of gradient
of f at a tilde. So, now, this answers the question that was
originally posed. So, if you had the water flow in the steepest direction downwards,
it is going to happen at a gradient of h at the point x comma y minus of that, in the
minus of that direction. So, at each point you should take minus gradient of f h at x,
y and that will be the direction in which the water moves. So, one can ask the same question about when
is it maximized and when is it, when does it remain unchanged. So, based on our previous
discussion, we are going to have the same thing happening provided of course, that the
hypothesis holds. So, assume the same hypothesis as in the previous slide which means that
the gradient exists and is continuous on a open ball around a tilde. So, in that case,
we will get there h, sorry fu is going to be gradient of f at a tilde times norm of
u, times cosine theta. Now, if you want to maximize this, so it is
maximized and cosine theta is maximized. Well, first of all, I can get rid of norm u because
it is always norm 1, times cosine theta it is maximized when theta is equal to 0, which
means u is in the
same direction as gradient at a tilde. So, in other words, u is equal to gradient of
f at a tilde by its norm. And this also justifies why we call it the
gradient. So, the gradient tells you, the word gradient tells you the slope at a point
that is what the gradient means, in English in colloquial language. And so, what we are
getting is that the gradient actually tells you that at a point what is the steepest slope.
So, if you are standing at a particular point on a mountainside the slopes could vary in
different direction. So, the gradient tells you the direction in which you have the steepest
slope. And we can also answer the second question,
which is when does it remain unchanged? So, it remains unchanged when the rate of change
is 0, which means when fu is 0, so that is when cosine is 0. And when is cosine 0? So
cosine is 0 when theta is equal to pi by 2 that is theta is equal to pi by 2, which is
the same as saying that u is orthogonal to or perpendicular because here we are using
Euclidean norm and so on, so that is perpendicular or at 90 degrees to the gradient vector. So, we have answered these three things. When
is it minimized in the opposite direction to the gradient vector? When is it maximized
when it is in the same direction as the gradient vector? And when is the directional derivative
constant when you move in a direction perpendicular to the gradient vector. Now, I want to warn you that in two dimensions,
of course, saying perpendicular to the gradient vector means that you go either, so it is
a line, you go along this way or this way, but we are in n dimensions. And in n dimension
that means there is an equation that you have to solve. And if you do your linear algebra,
that means you have a subspace of dimension n minus 1. And we will not talk a lot about
that in this video, but this may come up later. So, this is this the, these are the answers
to these questions. So, we have completely answered the original question, which was
in what direction does water flow? On the other hand, if you want to ask, you are an
adventure seeker and you want to go in the direction where the slope is steepest, you
want to do rock climbing and you want to climb up fastest, so what direction should you go
in, the direction of the gradient vector, that is what we have understood from here. And, on the other hand, if you have, if you
do not want to change your altitude at all. You want to keep going at the same altitude
and just hope that you do not have to climb down or climb up. And maybe if you go around
then you will hit a plateau and you can go you up, you do not want to do any climbing
up or down. Then you go in the direction in which the gradient is perpendicular, so you
go in that direction. Fine. So, just to summarize what we have said. So,
assuming the hypothesis as in the previous slide the property of steepest ascent is in
terms of directional derivative that is saying that fu is positive and maximum and that takes
place when u is equal to gradient of f divided by its norm. The property of steepest descent occurs in
terms of directional derivative when fu is negative and minimum and the direction is
the negative of the gradient, and you divide by its norm to make it norm 1. And the property
of no change in the height in the altitude or which means in terms of functions no change
in the function. So that occurs in terms of directional derivatives when fu is 0, and
that is a direction orthogonal to gradient of f. So, this summarizes what we have done
so far. Let us do a couple of examples to put this
in perspective. So, suppose I have the function f x, y is sin x y, let us compute the gradient.
We have actually done this computation before, so the gradient at x comma y is so at x you
differentiate with respect to x, so that gives me y times cosine of x, y, and then x times
cosine x, y, that is with respect to y. So, let us choose a particular point and evaluate
over there what happens. So, suppose I choose the point, let us say
pi comma 1, so at pi comma 1 what is the direction of steepest descent on the graph of this function?
So, let us first evaluate the gradient vector at this point. So, if you take pi comma 1,
so this is 1 times cosine of pi, comma pi times cosine of pi. Now cosine of pi is minus 1, so this tells
us that this is minus 1 comma minus pi, so the direction in which you should move is
opposite to this. So, u is going to be minus gradient f at pi comma 1 divided by its norm,
which is 1 comma pi divided by root of 1 plus pi squared. So, this is the direction in which
you should move. If you want to, for example, move in the direction
where of steepest ascent, where the function increases the fastest, the direction in which
the function increases the fastest, then you should move along minus 1 comma minus pi,
the unit vector in that direction. And if you want to move along the direction where
it does not change, so then you should move perpendicular to this vector, which is the
vector pi comma minus 1 or minus pi comma 1, so in either of those directions. Let us similarly do this example. So, here
we compute the gradient, so that is 2x comma 2y comma 2z. And so, suppose I want to now
ask the same thing about let us say at 1 comma 1 comma 1 what is the direction in which the
function increases fastest? So, the answer is in the direction of the gradient vector. So, the gradient at 1, 1, 1 is 2 comma 2 comma
2. And so, it should be in the direction, so in the direction of u equals 2 comma 2
comma 2. And because we typically make this a unit vector we should divide by its norm.
So, 2 squared is 4 times 3 is 12, so this is 1 over root 3, 1, 1, 1. So, instead if
you wanted the direction of no change, so in which direction does the function remain
constant? So, for this, we will have to move perpendicular
to the vector 2, 2, 2. Now there is many choices for the perpendicular direction. So, this
will be any vector which is on the plane x y z such that 2x plus 2y plus 2z is 0. So,
I will encourage you to work out some vectors. So, examples of such would be 1 comma minus
1 comma 0, of course, we should take the unit vectors, so, I will divide by root 2 or you
could have 1 by root 2 comma 1 comma 0 comma minus 1. And, in fact, this is a basis for those directions.
So, any linear combination of these two would be perpendicular to this direction, and that
will be a direction in which the function remains constant. So, I hope the idea here
is clear. I want to emphasize that this this video is extremely important because this
is exactly what you are going to use when you do something called gradient descent in
machine learning. So, in machine learning, you will have a phenomenon
called gradient descent where you try to keep decreasing something or increasing something
decreasing usually, and so, at a when you are at a particular solution you try to minimize
some quantity so as to get what we will call a better solution. So, so please, yeah, please
understand this video very well. I want to end on this cautionary note, since
we have been doing this example for a while, now, I want to again emphasize this example
to say that we need to know the continuity of the gradient function. So, recall that
for this function, the gradient of f at 0, 0 was 0 comma 0, but the directional derivatives
at 0, 0 other than the x and the y axis meaning the i and j these did not exist unless use
e1 or e2 or minus e1 or minus e2, which will be just negative of these. So, this thing that we used earlier that fu
is equal to gradient dot u, and which we use in order to compute the direction in which
it is maximum or minimum that here it is none of those things are applicable. So, you need
to know that your gradient is in fact continuous in a small neighborhood. And so, it may happen
that the function is actually has edges. So, it is jagged over there. And if that happens
then you cannot use any of what we are doing here. So, in such places, the gradient is
not going to be, the partial derivatives will not be continuous, and then we cannot apply
this theory. So, it may still happen that there is a direction. For example, if it is
like this and inclination is more over here you do not want to go in this direction, but
we cannot say that from the gradient. So, that condition is important. So, let us
summarize quickly what we have done in this video. So, we have seen this very important
idea that, if you have a function f for which the gradient functions continuous, then in
order to compute the direction in which the function increases most at a point that is
the gradient at that point, the direction in which the function decreases the most rapidly
at that point, that is the negative of the gradient direction, and the direction in which
the function stays constant that is orthogonal or perpendicular to the direction of the gradient
at that point. Thank you.