[Music] hello and welcome today we'll learn about linear regression our first machine learning model what is a model i know what you're thinking but that's not the kind of model we generally refer to in data science a model is basically a mathematical representation of a real-world process in the form of input output relationship something like this slices of pizza i'll eat my output will be determined by r since my last meal which is input that means if it is two hours since my last meal i can eat five slices of freezer even though it is useless this is an example of simple model but why should someone make models not only models help us understand the nature of the process being modeled they also enable us to predict the output based on the input features and the ability to predict the unknown has great economic value let's look at an example this is size of a house and this is corresponding price of that house these are observations for individual houses also known as training examples now we want to know price of a new house which will be output of our model based on its size which is input to start looking for a simple and yet effective model for this problem our first stop would be linear regression can't believe this is the same problem it looks good on graph so on x axis we have our input size of house and on y-axis the output its price any linear relationship between two variables can be represented as a straight line whose equation can be written as y equals a naught plus a1 x where y is output or target which is price of house in our case and x is input of feature which is size of house and a node and a1 are model parameters but what if there are more than one feature any general linear equation with multiple features can be written as y equals a naught plus a 1 x 1 plus a 2 x 2 and so on up to a and x m where x i's that are x 1 x 2 till xn are features ai's that is a naught a 1 and so on till a n are modern parameters and y is target variable as per our linear regression model we need to fit a straight line to it with equation y equals a naught plus a 1 x and depending on values of a naught and a 1 we can have many possibilities which look promising we need to settle the case on a value of parameters a naught and a 1 corresponding to which straight line fits best to the data for this we need to agree on a metric to judge best fit and we can choose that straight line which performs best on that metric first function to the rescue let's suppose we have m training examples or observations and this is the first one this is our model from this we know that this is actual price of first example and this is price of first example as predicted by a model which falls on a straight line for that size let's call this difference as error term e which is like y actual minus y predicted since it is for first example let's call it even for ith example we define error term as e i equals y predicted minus y actual now as you might have understood ei can be positive or negative depending on whether y actual is more or y predicted we will square e i's to make it positive so the order doesn't matter cos function will be defined as 1 by 2 m even square plus e 2 square and so on till e m square where m is the number of examples and e i's are error terms we can also write it as 1 by 2 m summation of y predicted minus y actual square we just expanded e is to y predicted minus y actually we can also write it as 1 by 2 m summation of a naught plus a 1 x 1 minus y actual square we just expanded y predicted which is a naught plus a 1 x 1 as per our model clearly cos function j is a function of parameter space a naught and a1 i think you would have guessed it by now the best fitting model would be the one which minimizes our error metric which is cos function such a straight line will be the best linear approximation of the linear relationship between house price and size of the house another interpretation of cost function can be the measure of distance of our model from data points lesser the distance better is our model now we just need to minimize cost function but how to do that we have established the fact that all the straight lines are just different combination of model parameters a naught and a1 and cost function is a function of parameter space as well therefore by changing a naught and a1 we can change the cost function we will keep changing a naught and a1 till we find a combination where cos function is minimized and for this we will take help of gradient descent algorithm let's just forget cost function for some time assume a function any regular function j equals f of a this curve of j represent the values j will assume for different values of a indeed that is what makes it a function of a we are at this point now a1 and f a1 and we want to reach here we want to know for what value of a a function will assume minimum how can we reach minimum starting from a1 [Music] let's calculate stop slope at this point a1 f of a it is dj by da at a1 or m dash a1 don't worry if you don't understand dj by da you can interpret it as a slope at that point let's move a step alpha in that direction to reach a1 minus alpha times after shape alpha is a small fixed quantity in the range of 0.01 therefore the size of our steps is dependent on the slope f dash a1 higher the value of slope which occurs away from minima larger the steps and vice versa as we move closer to the minima the slope decreases and hence our steps towards minimum keeps on getting smaller at minimum slope becomes zero these iteration of gradient descent algorithm can run in thousands or tens of thousands depending on nature of function our learning rate and of course where we start from even in this space we will use the same methodology to minimize cost function which is a function of model parameters a naught and a1 by changing them through iteration of gradient descent alcohol this is the part where it can get too mathematical for some people but don't worry if you don't get it completely you'll be working just fine without it as well step one in gradient descent algorithm would be to calculate slope with respect to both parameters separately at the current or initial value of parameters a naught and a1 next we need to take the step alpha and update the new parameters as follows third step would be to update the cost function with new a naught and a1 and then repeat the step one these iterations should be run thousands of times we can easily scale this for multiple features as you know our equation of linear regression for multiple features is y equals a naught plus a1 x1 plus a2 x2 and so on till a and x we just have to update all the model parameters by calculating slope and then implementing gradient descent steps for every parameter simultaneously understanding theory behind cos function and gradient descent algorithm was essential but as we shall see we can train linear regression model with just few lines of code and python these are inbuilt packages which implement all of this in an optimized way so that we don't have to worry about all the maths behind it one important thing to understand about gradient descent is the learning rate alpha we know that step taken towards the minima was alpha times f dash so as we increase learning rate we will take larger step towards minimum and hence we will reach minimum early this will make our algorithm fast right let's suppose we set alpha 200 and we reach here in a step from a1 by setting alpha to 100 can you guess what can potentially happen in the next iteration it will fail to converge to minimum and will keep on oscillating around minimum but can never converge even in a billion iteration so what is the solution keep alpha smart its value is kept around 0.01 so that it neither makes gradient descent too slow nor does it fail to converge learning rate alpha is hyper parameter for this model even though it doesn't directly affect model like parameters a1 a naught and a1 do but it can impact performance of our model and we need to be cautious about choosing model hyper parameters that's all about linear regression congrats on understanding your first ml model please like this video if you found it useful and if you have any doubt please raise them in the comment section thanks for watching [Music]