Mathematics of Regression Explained

Hello all, today we will be discussing about the maths intuition behind a regression problem statement. In my previous videos, I have already shown a lot of practical application examples with respect to simple linear regression and multiple linear regression. But in this particular video, I am going to describe or discuss about the detailed explanation on the maths of the regression part.

So, everybody remembers that the linear regression or the simple linear regression or multiple linear regression is basically given by the equation y is equal to mx plus c and this is basically my best fit line by using this equation i actually try to find out the best fit line over here my m is basically the slope right and c is basically the intercept let me just discuss about this first what does this basically mean suppose i have a problem statement saying that uh with respect to the size of the house i have some prices in my data set So some of the points will be somewhere populated like this, you know And what we are actually considering or what we are trying to implement by using a regression line Regression algorithm is basically that we'll try to create a best fit line such that this best fit line what you know It will indicate that for my future size suppose for this particular size I want to find out what may be the price of the particular house Then what I can do is that I can actually point to this particular point over here and I can plan find out the price value and I'll be able to determine the price with the help of this best fit line so this is my best fit line and as discussed this bedford line is basically indicated by the equation y is equal to MX plus C now what does this component mean that is my M value and my C value there is always remember whenever your size is 0 and your price is 0 right or whenever your size is 0 right not price is 0 and just cut this Okay, whenever your size is zero now over here when I consider this size I am basically considering that my X value is zero So this from this equation my if my X value is zero, so if my X value is zero If I equate in this particular equation my Y will basically be C So my Y will basically be the C which is the constant or the intercept and this basically indicates that When my size is zero at this particular point, where does my price actually? point to the y-axis okay so this particular point in this particular equation is basically my C which is my intercept so when my size is 0 and what is the point that I have in the y-axis for the price that is the point that is called a C the other thing that I want to discuss is about M this M basically indicates that suppose over here I have something like thousand square okay thousand square meter thousand square feet and Here I have suppose 1100 square feet, right? Now within this unit change, suppose if I'm considering that 100 meter is my unit change in my x-axis, within this unit change, with this unit change, what is the change in your best fit value? What is the change of the price that we are actually considering it as a slope? So this particular slope is basically indicating that with the unit change in my x-axis, that means my size, what will be the change in your y-axis?

axis, you know, and that is what we are trying to find out. Now, now the other way that you may be thinking that how can we think of a best fit line, you know, in this particular equation, see one way is that I can draw multiple best fit line, you know, multiple best fit line like this. And what I should try to do is that I should try to minimize this distance, you know, minimize this particular distance or this particular error such that if I do the summation of all the error, it should be minimal. right it should be minimal so this particular error we should try to find out the summation of all the error and whichever best fit line gives us the minimal error that value will will basically indicate or give me the slope value that is m and some value as c c which is my which will be my intercept now again i'll just clear this diagram and just try to explain you properly now you see this oops sorry Now suppose I have this, okay, I have this X and Y value. Suppose my X is basically my size and Y is basically my price.

And I have a lot of points that is getting populated. You know, now from this I found out that best fit line is this one. Okay. Now, what we should consider while selecting the best fit line, I will define a function which is called as cost function. Again, guys, this cost function is very, very important.

This cost function is also a basic thing that we'll also be learning in deep learning so it is very important to understand so make sure that you watch this particular video till the end So now this cost function can basically be indicated that as we said that the distance between the Best fit point that is this and my real point should be minimal So I can write a equation saying as 1 by 2 m m basically means that it is the number of points all the points You know with respect to the x and y axis 1 by 2 m I'll say summation of i is equal to 1 by m 1 1 2 m and then I will write it as y hat minus y whole square and this is what I'm going to write it down now you know that my y hat is basically be indicated by y is equal to mx plus c this points that I have over here right this is basically my y hat you know this is my y hat the points that you find out or that you predict in a best fit line is basically your y hat and this y basically indicates the real points you know it indicates the real points So we should try to minimize this error. We should try to minimize this error and while minimizing whichever will, whichever best fit line gives you the minimum error that is selected as the best fit line. But now the next question rises that just by using this equation, how do I find out the best fit line?

Because I can have multiple best fit lines, right? I can have a lot of best fit lines, you know? And then from that, I have to compute all the summation and then try to find out what is the minimal value. That will not exactly work.

You know, that is actually that will actually take more amount of time and which will unnecessarily waste your amount of processing power and amount of processing itself. You know, you can't just select million number of lines and try to find out the cost function. Instead, I'll just show you a more efficient way.

So to begin with, I'm going to give you an example again. So suppose I am considering X and Y. okay so this is my x okay and this is my y and suppose uh this values i have like one two three and four right and similarly i have over here one two three four now suppose let me consider some points over here suppose my first point my y point is basically when my x value is one my y value is also one so i'll draw my first point over here my x value is two my y value will also be two my second point my x value is 3 my y value will also be 3. suppose these are my three points okay and now this is my data this is my real data okay and this real data is basically given by the equation y is equal to x because when i'm saying my x value is 1 my y value is 1. now the next thing is that i need to find out my best fit line you know for this particular point so i'll write the equation as y is equal to mx plus c okay now let me consider that my c value is 0 okay The reason why I'm telling C value is 0 because I'll try to find out the best fit line through this particular points and I'll consider that that C value passes through the origin.

You know the origin when my X value is 0 my Y value is 0. And again there is a reason why I'm making it a C is equal to 0 because I'm going to draw a diagram over here which if I make C is equal to 0 then I will be able to draw a 2D diagram. If not if I consider C is equal to with some other value then I have to basically draw a 3D diagram. And for a 3D diagram, it will definitely be very difficult for me to draw it over here. Okay.

So I'm considering the C value as 0 indicating that it passes through the origin. Okay. When my X value is 0 and my Y value is 0. So when I make the C value as 0, my new equation is something like Y is equal to MX.

Now, this Y hat is basically indicating my best fit line. So for X is equal to 1. For x is equal to 1 let me equate x is equal to 1 and try to find out my y hat value And let me consider that my m slope initially. I'm just initializing my m as 1. Okay, so when my m is 1 So let me just equate it over here.

Okay, so when my m is 1 So this basically indicates that my y hat is basically 1 multiplied by 1. Let me just write it in a lighter format So y y hat will be actually 1 right? So my first point after 0 My best fit line will pass through this y hat. Okay So this particular point also has y hat and this particular point also has y now for my x is equal to 2 What will be my y hat?

You know that my slope is 1 and my x value is 2 so it will be 2 So then again this line gets extended and gets passed through this particular point similarly y hat when my value is 1 And my x value is 3 because my x value is 3 and this particular point is 3 Then again my y hat will actually be 3 so it will pass through this particular point now this is what my best fit line is you know my best fit line for this particular value when my slope is one when my m value is one very important Now, after I get this particular equation, I will basically find my cost function. Guys, remember, I have already discussed about the cost function and the formula is something like 1 by 2m summation of 1 to m y hat minus y whole square. So, this particular value, I have to reduce it.

You know, I have to find out this error and try to reduce it. Sorry. Now, what I'm going to do is that I'm going to equate this and suppose for the summation First of all, when my x value was 1, my x value was 1. What was y hat?

y hat was 1. So I'll write it as 1 minus, what was my y value when x is 1? The real y value is 1. So I'll write 1 minus 1 whole square plus, then when my x is 2, my y hat was 2. So this will be 2 minus 2 whole square plus 3 minus 3 whole square. So when I equate all these things, obviously my m value is 3 points. So m is actually 3, you know.

So one by six multiplied by zero is nothing but zero. Okay. So now this is very, very clear.

When my M value, when my slope was one, you know, and for this point, I got the cost function as zero. So what I'll do is that I will try to draw one more diagram. And this is the most important diagram guys, please focus into this. So here it is.

I'm having 0.5, 1, 1.5, 2.0, 2.5, right? Suppose this y-axis basically indicates my cost function my cost function suppose I write it as J of some M value slope Okay, and my here my slope is basically my M value and here also I will write it as 0.5 1 1.5 2 now what I am trying to do is that Everybody please focus into this what I am trying to do it in in this particular thing Is that with respect to every M value that I have initialized? What is the cost function that I've got? I'm going to plot it over here.

Now initially here, with respect to the m value as 1, you know, I have got my cost function as 0. So what I'm going to do, my m value is 1 over here, my cost function is 0. So this is the point that I'm going to get. I hope it is pretty clear. Now in my next step, what I'll do, I'll change this m value.

Suppose I take my m value as 0.5. Okay. Now with respect to m is equal to 0.5.

For this equation, if I equate, okay, my y hat for x is equal to 1 will be 0.5. My y hat for x is equal to 2 will be 1. And my y hat for x is equal to 3 will be 1.5. See, how I'm getting this, guys? You just have to equate this. See, when my slope is 0.5, when my x is 1, 0.5 into 1, 0.5, right?

My y hat is 1, 1 into 1, 1. Then when my slope is 0.5 and my x value is 3. you know three into 0.5 is 1.5 so i will be getting new points somewhere like this one two and three so this is my oh sorry this is my 1.5 so when i draw my best fit line it will look like this you know now when i try to find out the cost function when my m value is 0.5 you know i will be getting you just have to equate in this it will be 1 by 2 m summation of i is equal to 1 by m and it will be nothing but suppose my for x is equal to 1 my y hat was actually 0.5 0.5 minus what was my y it was 1 whole square plus similarly you will do it for 1 uh 1 minus 2 whole square plus 1.5 minus 3 whole square so just equate in this particular equation place the m value as 3 and m basically means over here m is nothing but or you can just write it as n if you are getting confused okay so n is basically the number of points okay and if you if you compute this you will be getting somewhere around 0.58 now when i say 0.58 you know this is your cost function when your m is 0.5 your cost function is 0.5 so here you can see when your m is 0.5 your cost function is 0.58 so it will be coming somewhere here your next point will be coming somewhere here you know Then similarly for different different M values, you know, you'll be getting points which will form this kind of curvature, this kind of curvature for different different M values, you know, and when you draw this when you draw this, you will be getting a diagram which looks somewhere like this, which looks somewhere like this. And this is basically called as a gradient descent. now this gradient descent plays a very important role guys which i'm going to explain you in the next screen now once you get this gradient descent when should you know that You should stop, you know for selecting a m value which looks good for this regression line or for the best fitment That is the next thing that I'm going to discuss So before that I'm going to clear all this diagram and let me just focus on two things One is the gradient descent as I said that this is my m value This is my cost function that is J of M and here I'll write it as 0.5 1 2 Sorry 1.5 to 2.5 right similarly here i'll write it as 0.5 1 2 2.5 3 okay and you can see that my graphs look something like this just a second okay so my graphs look something like this suppose i'm getting this point somewhere populated like this And this point populated like this. I'm just going to draw my gradient descent again.

It may not be approximately correct, but I Sorry, it may not be approximately correct, but I'm just trying to draw this diagram properly for you. Okay So here it is. I'm going to draw this which looks like this and which looks somewhere like this Okay guys, so this is Basically, my gradient descent, which I have drawn again, I'll write it as gradient descent. Now in my previous diagrams that I've already shown you that based on different different m values, we're getting different different points.

And finally, we could follow this particular structure. But the next thing is that how do we arrive to this particular region and this region is basically called as global minimum global minimum. Now the next thing is that I need to arrive at this particular position.

So for that initially suppose I consider that based on some m value I got my initial point somewhere over here. You know somewhere over here. So when I get my initial point over here that basically means that I have to move downwards right.

So in order to move downwards I will basically write a theorem which is called as convergence theorem. Now for this convergence theorem basically says that the m value you should subtract with m minus derivative of m, you know derivative of m with respect to m, you know Derivative of m with respect to m such that you know this derivative Multiplied by one more value which is called as learning rate, which is called a which is basically indicated by alpha So this is my learning rate Now let me just show you why this particular equation works. Now suppose initially and this derivative is basically my slope. This derivative is basically my slope. Now let me just tell you.

Now suppose for some m value I got to this particular point or at particular of this position and then I will be applying my convergence theorem. Now convergence theorem basically says that I have to subtract with the slope of this particular point. So if I want to find out the slope you just have to draw a straight line like this okay and this particular straight line is basically helping you to find out the derivative of the slope when I draw this particular slope the next thing I have to find out whether this is a positive slope or a negative slope that is important to find out now how to find out whether this is positive or negative now you should see or focus on the right hand of the slope and the left hand of the slope if the right side or the right hand of the slope is pointing downwards you know is pointing downwards at that time you can say that this is basically a negative you know the negative slope now you can see that at this particular point suppose my M value was somewhere like point minus 0.5 okay then your feasible M value is somewhere around 1 now when you find out your negative value yeah your this particular point value is having a negative slope what you do is that you subtract M with some and then whenever you do a negative slope derivative you will be getting a negative value okay and this alpha or learning rate will be a smaller value i'll tell you why we have to select this as a smaller value when i say smaller value then this value can be somewhere like 0.001 okay so when i take a negative slope okay when i find out the derivative of a negative slope then it will be a negative value and which will be a very minimal or small value okay now when i do like this i'll just write it as m plus some positive you know a smaller positive because he minus into minus is plus right so plus positive smaller value because this is a smaller Value of learning rate. So what will happen is that that basically indicates my m value should increase for minus 0.5 And it should come nearer to 1 so this step will be very very small You know It will be very very small and as iterations and different m values get selected This will be moving towards this particular global minima point Now the next thing is that if I select this learning rate as a larger value like 1, okay 1.00 What will happen is that?

Instead of taking this smaller step this point may jump to some other points like this Okay, this may take a longer jump and it may not reach this global minimum even after many iterations Even after many iteration it may not reach this global so for that reason we usually select the learning rate value as a very smaller value, you know, and Okay. Let me just consider that suppose for man Suppose I selected a random m value and I got the point somewhere here. Okay, somewhere here.

Suppose I got it somewhere here. Now you should see that when I get this point over here, if I try to find out the slope of this particular point or the derivative of this particular point, I'll see that my right hand side is pointing upwards and my left hand side is pointing downward. So this is basically my positive slope.

And when I try to find out a derivative of a positive slope, this basically indicates that my derivative will be nothing but m minus this derivative will basically be a positive value and then i'm going to multiply with my learning rate then which will be nothing but m minus some smaller value when i do m minus smaller value then you can see that initially suppose my m value was two i have to reach it to one so it will subtract a smaller value and we will be so always it is very important to understand that our Our learning rate should be very very small and this conversion theorem is very very important to reach this particular global minimum point So as soon as it reaches over here at this particular point if I try to find out the slope The slope will be 0 the slope will be 0 and when I have a slope is 0 that time my M Value will actually my M value will specify that this should be the value or this should be the slope of the best fit point that fit line and till then I have to follow this convergence so once I am a once I get to this particular point at this particular location when my slope is zero I will basically or my algorithm will basically be considering this M value as my best fit as the slope of the best fit line and that is the point where I have to stop training and that is the point which will be able to determine that that is the value of my best fit you know and this basically indicates the whole the explanation, the theoretical concepts along with maths of the whole linear regression algorithm. Now the next thing is that if I have multiple, multiple features in, suppose if I just, I don't just have only one independent feature, I have multiple independent. At that particular point of time, my gradient descent will look like a 3D diagram or a 4D diagram based on the number of features and each and every feature will try to move towards the global minimum point, which will be this particular minimum. Hope you like this particular discussion guys. I hope you liked the step-by-step process and how we derived it Please go through this video from once again from the starting and now after understanding this thing.

How do you implement it? I will be providing the link you can see in the top right corner I'll be attaching a link and there you can actually see the implementation part of simple linear regression and multiple linear regression I hope you like this particular video Please do subscribe the channel if you have not subscribed And yes, I'll be coming up again with some new good videos where I'll be discussing all about this mathematical combination. Then I'll try to derive all the things in front of you.

OK, so God bless you all. Keep learning. You're doing a great job.

And thank you for supporting my channel. Thank you one and all. Have a great day.

Transcript for:Mathematics of Regression Explained

Transcript for:
Mathematics of Regression Explained