We're in Section 4.3. What we're going to do is develop a model that is going to, as best as possible, pattern the data and help us make predictions. But the way we're going to do this is we need to determine first and foremost what is our equation that's going to represent this. So here we go. Ultimately, particularly when we have a linear relationship, we then can develop a regression line to make predictions. What do I mean by a regression line? Well, if you guys look at the two graphs at the bottom of the screen, you guys will see in blue a scatter plot, blue dots representing the scatter plot, but more importantly, I want you to see that red line being drawn through it. That red line is ultimately the regression line where the regression line, as you can see, is not perfectly hitting every dot but rather it's the best fit. What do I mean by Best Fit? I mean that it's literally in the center, it's in the center of the dots, and the idea behind the regression line, much like how we talked about the idea of mean and median, is that the regression line is going to be this one line that will best explain the whole set of data. So much like when we talked about mean in Chapter Three, the mean was that one number, that one number that best summarized all of the data. In the same way, the regression line then is that one line that will best represent all of that data in blue. And so notice there's a huge, huge call back to what we just did here. Now ultimately, how are we going to develop this regression line? Well, we are going to go back to our friend algebra, and remember the regression line is y=a+bx. And so what we're going to do is really discuss in the next few pages the three ideas here, the three ideas of what is the equation, what is the slope, what is the Y intercept. But first and foremost, let's just look at the equation, all right? Ultimately, when we are looking at this regression line equation, I want you to see that there are two variables flying around, y and x. Don't worry about a and b, I'm going to talk about that a little bit later. What I want you to focus on in the regression line is the letters x and y where, again, why? y is going to be that variable we want to predict whereas x is going to be a given value, a given value we're going to use to predict why. And so we give names to these variables. Ultimately, the y variable, what we want to predict, is called the response variable because it's dependent on what x value we're given, whereas the x variable, what we're using to make the predictions, is called the explanatory variable. It's independent in that you get to choose what value you want for x to help dictate what y is. And that when it comes to writing out your regression model a lot of times we'll write it out in words, we'll write predicted and then write out our response variable is equal to "a" "plus" "b" "times" whatever that explanatory variable is, where I want to just emphasize that the one word we're absolutely using in this formula is predicted. Because again like we saw in the previous page this line is not perfectly hitting all of the dots and so because it's not perfect we use the word predicted to emphasize the fact that it's as close as we can get. Now, I know I talked about this very hypothetically, so let's just do some practical examples right now. In example one, the EPA wants to use a car's weight in pounds to predict a city's miles per gallon for a sample of 14 cars. They found a strong negative linear relationship between the car's weight and a city's miles per gallon. So I want us to look at this regression model. So what I want you guys to see here is that what we want to use is a car's weight and so right off the bat I want you guys to see here that the x variable is weight. The x variable, the explanatory variable is weight. I want you to see that what we want to predict is a city's miles per gallon and so because what we want to predict is the city's miles per gallon that is then my y variable, the response variable. And so understanding what weight and miles per gallon represents, notice that the predicting miles per gallon is on the left of the equation and that the weight is on the right where all the numbers and subtraction signs appear. Now what I want to do is part A. I want to use this model so we predict the approximate City mile per gallon for a car that weighs 2780 pounds. And so again if we want to predict a city's miles per gallon when given a specific weight what we need to do is we need to take the equation cities miles per gallon is equal to 42.154 minus 0.07 times the weight and what we're going to do is literally place that weight with the value we're given 2780 and then you just plug that into the formula. Now keep in mind when it comes to order of operations remember you have to do the multiplication first all right why don't you guys plug that in because ultimately what is this emphasizing it's emphasizing that if a car weighs 2780 pounds we then have that we predict and I need to emphasize we're only making a prediction this isn't 100% guaranteed but we predict it's its City miles per gallon is going to be about 22.7 miles per gallon but the second question is would it be appropriate to use this model if I wanted to look for a really light car? In this case in this case is it appropriate to use the model for a car that weighs 1,880 pounds? Well in this case the answer would be no and why well I want you guys to zoom in on the graph zoom in on the graph what I want you to see here is the fact that my weights range from about 2,400 lb to a little bit above 3,400 lb so in this case we can see here that the weight the weight is having data ranging from 2,400 to about 3,600 pounds so that means that this line that is in the graph this line that's in the graph this equation right here is ultimately only applying for weights between these values and so my question for you my question for you is is this weight 1,880 pounds is that within the range of data no it's not and so in particular when it comes to doing regression analysis you're only allowed to apply the equation within the range of data so in this case no we can since that weight of 1,880 pounds is outside of the range of data.