Transcript for:
Linear Regression Analysis with Excel and R

welcome to lecture five where we're going to see some examples of using excel as well as r to be able to do some things in regards to linear regression and multilinear regression and in general just sort of regression analysis so let's check out what it is that we can learn from this so excel is kind of crashing on me at the moment see if it comes back here we go okay so hopefully this is working looks like it looks like it's awesome okay so let's copy insert copied cells copy paste special just the values okay so now it's not random anymore on this so it won't change that's good let's save that and what are we going to do well so first let's look at how do you do sort of various things in excel so go to data data analysis and you can collect the select regression we're going to choose our y input range let's choose column f uh let's actually get a little bit better than that let's specifically select the entries i don't remember how far this goes down i think this is a big data table okay so i'm not going to do it that way i am just going to select that and it's like confidence level yeah let's do residuals residual plots line fit plots known probability plots see how big is this to that let's see that's one two wait one two three four zeros and a one so data analysis regression so f1 down to f one two three four no one g1 down to g1 one two three four and a one so we have our new worksheet this will take quite a while for this to run through there's actually a lot of data here so the way that we can do this then in excel is that what we do is that we say data data analysis regression and we're going to pick just a sample of data because if we do all of it it's going to break so f1 through f101 g1 through g 101 labels residuals residual plots and what we come up with is the following datasheets sheet and here we have the r square value which in this instance is extremely low we have our intercept values so this is the uh this is the code the intercept value of the linear equation set up of for this particular example trying to model value and cost so this is a very basic kind of model so you can see in excel it's pretty straightforward to do that and you can read off all the various things there's a t-test what is the t-test doing the t-test is testing for whether this value is zero and here's the p-value for that t-test which you can interpret based upon what a p-value is here we have also confidence upper and lower confidence intervals on this value which is the intercept value so with 95 confidence we can say that the intercept value is in between negative that number up to positive this number right here so f test what is the f test the f test is testing whether sort of you can think in the sense whether there is any interaction between these variables and so that's that's just what excel would say so let's take a look at how we could use r to do some of the same same analysis or i quite like r i prefer r actually i think it's better to use r and use other tools okay so we want to open up our studio and uh and i'd like to share the screen here rstudio there we go so hopefully everybody can see it and first thing is we need to know how do you import a data set so import from excel because i happen to have this as an excel file then i'm going to browse i select like usual so we have our file and we're then going to import awesome we now have our file here's this un data and let's create then a linear model lm quad lamb of now i have to call up this particular u n u n data and i should just give a value value connected with what is this one called i think it's cost cost seven we'll see if that works that does not work so let's see inspect this cost dot dot dot seven okay i think that's what i'm missing okay so because i cannot see the capital versus uncapital when i write this this is what i want to write so i now have a linear model so i can call up lm [Music] mod there we go and it gets me my coefficients my intercept as well as then the values for that so here we can see the residuals versus fitted and if there is absolutely no heteroscedasticity you should see a completely random equal distribution of points throughout the x range and a flat red line so here we have heteroscedasticity our primary we have no heteroskedasticity here we have the theoretical quantiles but the more interesting plot for us for this particular model is not even the scale location but residuals versus leverage so you can look at the residuals versus their leverage which their leverage is the importance of it let's actually go back so scale location is another one standardized residuals are on the y-axis here so because there is no heteroscedasticity that means that there's no variability of a variable across the range of values of the variable that predicts it and therefore then there's no variance of our model or rather there's equal variance so this is a perfect example of where we can then move ahead with the regression analysis so this data set could be referred to as homoscedastic meaning it has same variance now were it to be heteroscedastic uh the data right let's go back to here the data here we go the data would tend to be cone-shaped and to have then a this red line would be increasing or decreasing one of those kinds of things so based upon the fact that i am presuming that you have seen these things before i'm going to say that this is a sufficient very quick introduction to linear regression both in r as well as in excel