[Music] now why this is called variance inflating factor because from here you can understand if the r square tends to be if the r square j tends to 1 then what will happen v r variance of beta j will tend to infinity isn't it variance of beta j will tend to infinity that is why this 1 by 1 minus r square j is called the variance inflating factor okay if the there is high degree of linear dependence among several explanatory variable that means you can expect r square j would be quite high and if r square j tends to infinity then sorry tends to one you will get variance of beta j tends to be infinite okay tends to be infinite so that means this is a serious consequence so what will happen that means variance of beta j variance of beta j beta hat j as i told you this is sigma square divided by summation x j square into 1 minus r square j and when r square j tends to one variance of beta j tends to infinite and if you plot r square in this axis and variance of beta hat in the y axis then you will see the relationship will look like this look like this so as r square increases you will get this type of relationship it will increase like anything ok this type of relationship now when variance increases variance of beta j increases beta j increases that will lead to the standard error of beta j also will increase the standard error of beta j will also increase and if standard error increases of beta hat what will happen to this t statistic which is nothing but beta hat j divided by standard error of beta j what will happen if this increases then t statistic will fall drastically so t statistic will become will become insignificant so that means in presence of multicollinearity what you observe as the t value that is actually not the actual t value because the variance got inflated in presence of m c standard error got inflated in presence of fc that means this t statistic appears to be artificially low and if the t statistic is low then what will happen you will see that your individual t statistic is insignificant so that means variables will appear to be insignificant insignificant ok but but even though this t statistics individuals t statistics are insignificant you will still get a high r square because that is coming from here so that means what would be the consequence of multi collinearity suppose this is your model y i equals to alpha plus beta 1 x 1 i plus beta 2 x 2 i plus u i and you see by individuals t statistic let us say this is t one and this is t two both are insignificant in presence of m c but the r square what you get from this model r square will be quite high so this is the classic symptom as well as consequence of multicollinearity so that means many of your explanatory variables will appear to be insignificant but still you see the high r square value from the model so whenever you get this type of result you estimate a model and immediately after estimation you see that your r square is quite high but many of your explanatory variables are insignificant individually then immediately it should click in your mind that means my data is suffering from multi collinearity problem this is the classic symptom as well as consequences of multicollinearity problem this will happen right ok so that means your variance will get inflated i will write shortly consequences variance variance and standard error standard error will get inflated inflated in presence of m c multicollinearity and as a result of which you will get insignificant insignificant t statistics for the explanatory variable but high r square so these are the two consequences of multi collinearity so that means out of the three desirable properties consistency and unbiasedness properties are still maintained but efficiency property get disturbed and when i say efficiency it means for a given sample the variance of a particular beta hat let's say beta j if you assume beta j is correlated with other variable may get disturbed and that will result in this type of situation that will result in this type of situation this is the classic consequence as well as symptom of multicollinearity problem right multicollinearity problem now what you will do when you have these are the consequences then the the next question is how will you detect multicollinearity detection of multicollinearity detection of multicollinearity first this is a simple measure you just do what you do check check pair wise correlation check pairwise correlation among the explanatory variable for example let us say my model is ah y i equals to lets say ah beta 1 plus beta 2 x 2 i plus beta 3 x 3 i plus beta 4 x 4 i plus u i then the simple measures that they say when you write a paper also before presenting your regression result what you show actually in your paper what is the pairwise correlation between x2 x3 x3 x4 and x1 x2 x4 so this is called pairwise correlation and the statistical software what you use what we use if you simply after importing the data if you simply put the stata command which is corr then x 2 then x 3 and x 4 if you put this command this is the stator command stator command then that will give you this pairwise correlation okay pairwise correlation and any pairwise correlation which is 0.8 or more than 0.8 that means when r pairwise correlation let's say between 2 3 equals to 0.85 or let us say r 2 4 is 0.81 or let us say r three four equals to zero point zero point ah nine zero so these are basically high pair wise correlation so if you get this type of result then you will understand that there is some kind of multicollinearity problem in our data set and we must rectify that but this is a simple measure as i said that pairwise correlation is not a very rigorous statistical measure of detecting multicollinearity because this is only a simple pairwise correlation measure and it has some limitation also it has some limits limitation for example let us say ah let us say um i am assuming that this x 4 is highly correlated with x 2 and x 3 that means let us say lambda 1 lambda 1 x there is perfect linear dependence lambda lambda 2 sorry lambda 2 x 2 i minus lambda 3 x 3 i minus lambda 4 x 4 i equals to 0 okay there is perfect linear dependent so what you can expect from this relationship from this relationship we can expect that r square r square 4 equals to 1. so that means when i am regressing the fourth explanatory variable on the second and third explanatory variable the r square from this model is coming out to be 1 okay but if you if you write this this formula what is r square 2 3 actually r square 4 2 3 is basically basically r square 4 3 plus r square 4 2 4 2 minus 2 into 2 into r square ah sorry this is not r square this is r r 4 3 then r 4 2 then r 2 3 1 minus r square 2 3. so that means if you substitute 1 here so for 1 this is r square 4 3 plus r square 4 2 minus 2 into r 4 3 r 4 2 r 2 3 this is one minus r square two three where r square indicates the squared correlation or you can say that by regressing the fourth variable on this and this is partial correlation coefficient now if these relations to be satisfied so that means if this relations to be satisfied then what you can understand is that from there if just look at this relationship carefully okay look at this relationship carefully that this left hand side is one and you need to have specific value for this now this relation if you if you look at even for r square even for r square 4 r 4 2 if you assume 0 point 0.5 and r 4 3 is also 0.5 and r 2 3 is minus 0.5 you will get this condition to be satisfied ok this condition to be satisfied that means pairwise correlation if you see it is not very high but still you have high degree of multicollinearity so here that is why i said that pairwise correlation may not always guarantee that means low pairwise correlation does not always rule out the possibility of multi collinearity problem that is why this is not a very rigorous statistical measure because from this relationship what i see that even though r square 423 is quite high that means that means when r square 4 2 3 is 1 from the you have serious multi collinearity problem 4th explanatory variable is perfectly linearly dependent on 2 and 3 second and third variable perfectly explain the fourth explanatory variable but from the partial ah pairwise correlation we see they are only 0.5 so that is why it may not if the pairwise correlation is quite high then you will understand there is multicollinearity problem but if they are low they do not rule out the possibility of multicollinearity problem we have to ah check for some other measure and what is the other measure that is simple to apply also that is v i f that means 1 minus 1 1 by 1 minus r square j ok this is the other way of detecting this so if the v i f so defined if the v i f if the v i f is greater than 10 for this particular jth variable that will imply that the jth variable jth variable shows m c that means jth variable is perfectly linearly dependent with others so this is a simple formula ah to apply and it is very easy to estimate from the statistical software also and third one is as i said how will you detect that ah many of your many of the explanatory explanatory variable explanatory variable are individually insignificant as t statistics are insignificant but high r square so this will also help you detecting multicollinearity in your data set ok in your data set okay so that means we have discussed the consequences of multiculinarity and this is how you can detect multicollinearity and next thing what we need to do is what we need to do is the solution solution for multicollinearity what to do what to do to solve m c multicollinearity ok to multisolve multicollinearity now in this context in this context econometrician they generally say that multicollinearity is a sample problem ok what i am saying this is a very very important statement i am making they say that multi collinearity is a sample problem is a sample problem so that means we need to modify the our sample appropriately to solve the multicollinetic problem i will explain this let us say that your model is consumption consumption equals to alpha plus beta 1 beta 1 income this is i income plus beta 2 beta 2 wealth ok wealth so when i include both income and wealth in my model to estimate the consumption function that means what we assume we assume that in population in population in population income and wealth are actually are not are not correlated correlated but in sample what is happening here it may so happen in sample the sample what you have collected then you got information from those individuals who are having so in sample individuals individuals are having high income with high wealth as well so that means even though even though in population we assume income and wealth to be not correlated because that is how consumption function theory says that consumption function de consumption depends not only individuals income but also wealth we assume in the population they are not correlated but when we collect the sample it may so happen that individuals are having high income with high amount of wealth as well wealth as well so how will you solve this problem to solve this problem you need to collect a sample wherein there are individuals so what is the solved solution what is the solution in our sample in our sample we should have we should have we should have individuals we should have individuals with higher higher income but lower wealth and higher wealth but lower income so we need to design our sample in such a way that this particular property is satisfied we have individuals which are having high income but low wealth and we have other individuals who are having low income but high wealth ok so we need to solve this problem by designing the sample strategy appropriately that is why they say that multicollinearity is basically a sample problem but it is not always possible to get that type of sample and how will you get this sample just to increase your sample size increase your sample size if you increase your sample size then you have higher probability of getting that type of individuals who will have higher wealth low income and higher income low wealth ok low wealth and increasing the sample size when you increase the sample size one more thing happens let's see this is variance of j j lets say beta hat j what i say this is equals to sigma square divided by summation x j square and 1 minus r square j and when you increase what is x square j this is x square j is basically this is summation x j minus x j sorry this is capita so this is x j minus x j bar whole square okay so if you increase the sample size obviously this will also increase okay that is why when this increases then variance of j will come down that is why increasing the sample size will solve two problem you will have individuals with higher income lower wealth and lower wealth high income at the same time you are reducing the severity of multicollinearity problem as well right now there are other ways by which we can solve the multicollinearity problem and that those other measures will discuss in our next class thank you very much