[Music] and hello and welcome to today's lecture so we will as as always we'll will start with briefly recapping what we discussed in last class we had been mostly discussing about correlation and regression right in case of B by variate data so when you have X and Y and you want to see that if you have some Trend can you find out a line can you find out a line which gives a way of interpolating or extrapolating to determine Y at some at some unknown location let's say I can find out given this particular X star I want to know what is the value of y star okay so the way to to it is from the trend itself we want to find out this equation of the line right so what we do so in this particular case when it gives us the impression that there is some linear Trend we fit these data points using a line which is yal a + b x okay so clearly this line has equation of line has two unknowns okay these are your two unknowns and you want to find out what is the the value of a and b if you had taken if your points were two if you had only two points then A and B will be uniquely determined but when when you have multiple number of points then there are infinite solutions to a choice of A and B and that is what brings us to linear regression how do we go about finding out the equation of a line which we think is the best representative of this data okay so one of the strategies is followed is to minimize so if this is the line that I'm drawing and these are our points okay I minimize the deviation okay I minimize the sum of these deviations from this line okay so in other words and I had previously discussed that if these deviations so deviations are simply y minus a + bxi okay so this is I you can say EI is equal to y i- a plus b x i now if you simply do total e as summation EI this will underestimate the true error because there are some errors which are positive so when Y is greater than a plus BX I so here you have positive error here you have negative error so these will cancel each other hence this is not the approach to take so what you minimize is e is equal to summation of y i - A + BX I whole Square okay Square means you are minimizing the net deviation without taking into consideration whether it is positive deviation or negative deviation okay and we had briefly discussed that as opposed to a function of a single variable where you take the F prime or the you so if you have f of x so finding out what value this is you know minimum you set F Prime of x equal to Z so when you have G of X comma Y what you do is you set d g DX equal to Z and DG d y is equal to 0 okay so once again if gxi is let's say equal to x² + XY + y s then DG DX is going to be 2x + y because if you you're taking the derivative assume Y is constant plus 0 is equal to 2x + y okay and DG Dy will be is equal to 0 + x + 2 y okay x + 2 y so you set both of these equal to zero so in our case the equation is e is equal to summation y IUS a + b x i whole square right so I will set d e d a is equal to z and d e DB is equal to Z because these are the unknown variables this is what we want to find out okay so if I use this particular expression I can write down two Z okay so I essentially take the derivative with respect to this entire term this is of the form Z Square so when I take a derivative of Z Square I get a form of 2 Z so two and this is the Zed and then of this if I take a derivative with respect to a so D Yi d a is zero Del minus a d a is equal to minus1 and d vxi d a is zero so I get a value of minus one so this equation basically will give you the expression summation Yi is equal to summation of a plus bxi Okay so I can write summation Yi is equal to summation a plus b summation x i so summation Yi is basically n * Y Bar summation a is n * a and summation B is n * xar so this gives me the equation Y Bar is equal to a + b xar okay this gives me the equation Y Bar is equal to a plus b xar i can then so let us do the other expression d e d b = to 0 would imply two summation first part Remains the Same only thing is now I have to multiply with this taken derivative of this with respect to B which is nothing but minus x i okay so I can simplify it I can get rid of two and minus so I can write it as summation x i Yi is equal to summation of a + b XI into x i okay so I can simplify it [Music] further is equal to a summation of x i+ b summation of x i s okay so I can further simplify it X Y is equal to N A xar + B summation x i Square okay so I have two equations so let me write down the final equations I have Y Bar is equal to a + b xar and summation x i Yi is equal to n a xar plus b summ x i s okay so let's say this is my equation one this is my equation two I have two equations in two unknowns which are a and and B respectively now can I eliminate one so I can I can multiply first equation by NX bar and let us see what we get okay so I have n xar y bar is equal to n a xar + b n xar s so let's say this is equation 3 so if I deduct then 2 - 3 would imply summation x i Yi minus n xar y bar is equal to B summation x i s minus n xar s okay so I can uniquely determine B the expression for B becomes so B becomes summation x i Yi minus n xar y bar by summation x i s - n xar s now does it can I compare it with some of the definitions of covariance of correlation coefficient so let me write down the expression for co-variance that we determined sxy is equal to summation x i y i minus summation X sum y by N All divid by n minus1 okay so you can clearly see so I can rewrite this equation so I can write summation X as n xar summation y as n y bar and by n by n minus one so I can cancel each other out so I simply get x i Yi minus N xar Y Bar whole divided by n minus one okay so in other words you see that this whole term is nothing but this whole term here okay so you can clearly get an idea that this is about so this term is nothing so this term is nothing but n minus1 into s XY so if I call it term a so my a is nothing but n minus1 into s XY okay so let us see what is just you know so again I'll write down the expression for B is summation x i Yi - n xar y bar by summation x i s minus n xar squ right so what is this this is nothing but n minus one into s x² okay is n minus1 into SX s okay so I can and this we found was n minus1 into sxy okay so my B is nothing but nus1 into sxy by n -1 into s x² is equal to S XY by SX x square okay now I I know that my correlation coefficient is defined by sxy divided by SX * Sy y so my B can be determined to be row so SX y I can write row SX Sy y by SX squ so then B is nothing but row into Sy y by SX okay so I can deter my B is equal to row * Sy y by s x so B is the coefficient the second coefficient I have determined and I have the other equation Y Bar is equal to a + b xar so a is nothing but Y Bar minus B xar okay so I can determine once I know B I can determine what is a okay so let us work out a specific case and see what value we get for X and Y okay so let's take a very simple example okay 15 okay so these are our four values or let me make it five okay 5 is 20 okay so this is my X and Y data and I want to find out why Y is equal to a + b x and what are our values A and B respectively okay so what I need to do so as per this equation B is row * Sy y by SX okay so I are or I can also write s x² XS y by S x² okay so I need to find out sxy and SX s respectively okay so I will write down these values again so I have X I have y okay 1 2 3 4 5 5 10 12 15 20 okay so for calculating sxy I need XY okay I also need x² okay so my xar is equal to 3 Y Bar is equal to 25 50 62 6 2x 5 is approximately so it is exactly 12.4 okay let me calculate XY it is 5 it is 20 it is 36 it is 60 this is 100 okay so XY becomes 80 180 so summation XY becomes 221 okay let me just cross check again 180 221 yeah okay x² is 1 4 9 16 25 so summation x² is equal to 64 53 okay so this is SU XY so I can find out the expression for sxy so s XY is equal to summation XY minus n xar y bar by n -1 equal to so summation XY is 221 - n is 5 xar is 3 into Y Bar is 62 by 5 all divided by N - 1 is 4 = 221 - 5 62 into 3 186 by 4 21 3 y by 4 okay and SX is equal to square root of okay so SX is equal to square root of summ x² - n xar s by n -1 so s x² will be summation x² - n x² by N - one summation x² we found out was 53 - n is 5 into 3 2 by 4 okay so I can find the value of SX is equal to so 9 45 8 by 4 = 2 so SX is equal 2 SX Y is equal 35 by 4 my B comes out to be sxy by SX s is equal to so 35 by 4 into 4 4 into 4 16 so B comes out to be 35 by 16 okay and a okay a comes out to be Y Bar minus B * xar is equal to Y Bar is 62 by 5 minus B is 35 by 16 into X bar is 3 okay so I cannot cancel anything just 80 62 into 166 99 so 9992 minus 467 by 80 is approximately to 46.7 = to 465 7 let's say we approximate equal to 8 into 46.7 8 okay so I say a is 5.8 and B so a becomes 5.8 and B becomes 35 by 16 2 32 30 approximately 2.2 okay so your final expression becomes 5.8 + 2.2x okay so if we sub subtitute X is equal to so we had the value of x is 1 so X is 1 I see so let me again recalculate all our values okay x y and then predicted okay I had one 2 3 4 5 my y values are 5 10 12 15 20 so my predicted equation is 5.8 5.8 + 2.2 into X okay so what you see here x = 1 is 8 xal to 5 11 is 16.8 okay so what you see here is in the way our points are so you're if you see the percentage of error okay so this raises one important point if you see the percentage of error so here you are off by 8 minus 5 by 5 is equal to 60% error okay in here your error is 16.8 so roughly 20 - 16.8 by 20 is equal to 3.2 by 20 which is only 6% okay so this is this brings us you know it raises a very important point when you when you mark these errors as y increases okay so what do you see so let's say in a particular case my X varies from 0 to 50 okay so when I and and you know let's say these are my points okay okay when I am here when I am here this error is actually insignificant so if you if you count the magnitude of the error right so my error total error I Define as summation EI squ okay so when X is low when X is small say let's say x is equal to 1 so my magnitude of error E1 or E1 is going to be way insignificant compared to let's say e50 that is when X equal to 50 okay so here instead of three you are getting a value of let's say five or six so your error is two or 3 but instead of 50 you're getting a value of 60 or 70 right so that error the proportion of error is so this is 10 let's say this is off by 10 but this off by 10 has a much significant contribution so when you write your error expression so e50 square is going to be significantly greater than E1 Square so this would tell you that for higher values your your estimate is going to be better than for lower values okay so how do you tackle this problem one way to tackling this problem is actually okay so one way to tackle this problem is let's say you have X Y X1 y1 X2 Y2 okay so you calculate your let's say Zed is = to a + b x okay you define Z1 Z2 Z3 so on and so forth so instead of defining your error as simply summation y- Z whole squ okay you can Define it as summation of Yus Z normalized to Y squ so what this will ensure is it normalizes the magnitudes of the errors as well so between three it going up to five you have a huge amount of error but when you normalize by 5x3 you get a much lesser amount similarly when you go from 50 to 60 right your actual error is 10 but 10 normalized by 50 will give you a value which is comparable to this value okay so you here your your error is 2 by3 so this is 67% and here it is 20% okay so earlier you were comparing this jump two with just CHS 10 so in that case your data will be much better fit in lateral portion because this error is getting a much greater weightage in this expression okay so this is one way to normalize so of course in this case you cannot do it by hand but you have to you know you have to depend on programming to write down the appropriate code okay so one more thing I wanted to point out okay so of course let's say in the best case situation your data has some linear Trend and it is easier to you know fit a linear curve but in the more generic case let's just say which you will see is very let's say you have X and Y which are related as follows so I actually drawing the curve and not the exact points your points you can draw those points on top of each other but what you see is the curve is not just a linear line so if you fit this data with a linear line if you fit this data with a linear line so so your Curve will look something like this okay but this does not capture the essence of the phenomena you're trying to study so it is not always beneficial to just use a line to fit your data so you take a look at your data and see what information it conveys so for example right if your data is like this right you can clearly see that there's a nonlinear increase in y with X what kind of a function should you use one way is to use a polinomial right you can choose a polinomial now depending on how fast this curve is rising you can choose Y is equal to F of x² so let's say you can have a + A1 x + A2 x² or you have a function of X cubed so you can you can write function of X cubed also okay but the nature of the rise how fast the rise is Will dictate what kind of a polinomial you will use okay now let us take one more example okay so let's say you're designing a robot which goes from two points so this is my XY okay so two points okay and at each point okay it is going in this particular plane from point A to point B at different times and you want so you know at this particular point it has a certain velocity here and at this point it has a velocity in a different direction right so what you so this is XY plane but this is actually a is a function of so this this is let's say a trajectory as a function of time right so this is time axis okay as a function of time you are moving along XY axis okay and so you are given two pieces of information at time T not and time T1 you given the position and the velocity okay so it tells you that there are four conditions being prescribed you are given the position and the position so if I were to represent this as my R so I know what is r vector and what is r dot okay or R Prime which is equal to Dr R by DT which is a vector which is the direction okay so what kind of a function can I choose so there are four unknowns right let's just say for each trajectory let's say if I were to assume a trajectory as a not okay let's say I assume this particular function okay now I write them as a not A1 A2 A3 because it is Vector right your OD position stands for X position and Y position okay so but what you see is so what you are provided at every time is either the x or the x dot similarly the y or the Y dot so you have two conditions right so from this depending on the number of variables you can choose so in this particular case you can actually find out the exact solution if you know the position and you know position at two different points and the speeds at two different points right so I can use x at t0 X at T1 X at X Prime at t0 and X Prime at T1 to find out so I can put four equations and if you have four independent equations in four unknowns you can uniquely determine okay so that gives you an idea of what kind of function you can use to fit it so linear is only one particular case then the generic case case you can you need to fit with different kind of functions okay so that brings our lecture today to a close we discussed about regression we worked out some examples of how you would find out regression and we showed that using one particular example we showed that the way we calculate the error which is the square of the deviations then for low values of X your deviation is small hence it is it gets less represented in the overall definition or overall computation of the error so our way of eradicating it is to normalize this error with respect to the value that you are doing so then you relatively give equal weightage to each value with that I thank you for your attention and I look forward to meeting you again in next class thank you and