Okay, here I am back in our studio. We're talking about the independent variable temperature and dependent variable consumption. And we did the correlation.
We did a mean of one of the columns. And we did, you know, just some stuff. So now the question is a more sophisticated model. As we end with our ice cream, we import the data set again.
I'm going to go over it. I import it here. See, press the little button.
See, I need the library reader. That's the reader package that I'm going to go over these packages after. So we have consumption, income, price, and temperature. I'm going to attach the sucker. So we attach it.
Ice cream. Ba-boom. So here we have it attached.
Now we're going to do something, right? Very important. So we're going to do a very sophisticated model called regression analysis. And this is what we're going to do with the guerrilla data set at the second half of the course, because we're going to go through all national security data sets and see how we can test some certain ideas, like is the government doing well with terrorism?
What's the threat level of chemical attacks using mice? Sorry if. If people are in PETA, I apologize. But as we read the article, that's what they use to actually test it. So we're really going to do the real research they do.
So in national security and homeland security, et cetera. So the idea here is, does temperature cause an increase or decrease in consumption of ice cream? This is what they call regression analysis. So I'm going to put yummy.
You call it whatever you want. Oh, I already have a yummy. Yeah. So we're not going to call it that.
Let's call it, what do you people like? Let's call it Pokemon. And Pokemon, now we're going to use functions. I'm going to explain this to you. That's what we call something, making an argument, Pokemon.
Then the way I do, I use this arrow and line. You could also use an equal sign, but equal signs are made for something used for other reasons, other things, so I don't do it. So this is what...
I call it. So it's going to be called Pokemon. That's the name.
And then I put the arrow and the line. This is what you call coding, what they do in national security, any type of high level research. So I'm going to use a linear model. Notice all of these variables are numbers, right?
So that allows us to use what we call linear models. And that's exactly what it's called LM function. That's like a function is essentially, right? like a command.
So that is a command. So then you put in parentheses whatever you write in it. Coding is always filled with parentheses.
Once you open a parentheses, you have to close it. That's why it closes automatically here. But if you do start writing longer codes, you've got to make sure it's closed.
So what am I going to do? The dependent variable always comes first, which is consumption. That's the result, right? And we're going to see this when we apply it to national security. So CON.
S, it's right here, as you can see, consumption. Then we do a T at tilde. That's above this little mark here. It's on the far left-hand side, top left-hand corner.
You can see it right over there. I know no one knows where it is because no one ever uses it. Then temperature, TMP. Then you see it right here.
That's the independent variable. That is the cause. Does temperature cause an increase or decrease in?
consumption so then comma you always need commas in this data equals ice cream right when you put the ice you should see it there and i go enter boom we just made a model i mean what's interesting about this people make models like so easily it's not that difficult and you shouldn't be afraid of it so pokemon is what we call this so we want a summary just type in summary and you'll see it up here you don't have to always press these buttons But you'll see it. And then you see Pokemon. Now, Pokemon's having a party.
You can call it anything you like. So I go here. And then I hit the button. Now, boom, boom, you can hit this up. And you won't see it.
Then you hit this down. You won't see it. Then you'll see just the 30 data points.
That's the large end data. I want to get that up. And this is what we call our output. This is the call, LM, formula, consumption, temperature, data is ice cream. These are residuals.
This is what we call coefficients, which we'll be reading. Right. So what are these numbers?
Don't get scared. People get really scared. Oh, my goodness.
Numbers. This is the independent variable. This here, the estimate, is our practical value.
It's basically our slope, right? So you basically want to ask yourself, right, how much on average is the consumption moving, right, with every unit of temperature? Now, 0.003 doesn't seem like a lot, but when you look at consumption being 0.3, 0.4, you know, 0.28.
That is actually a good movement of temperature. So you, of consumption, sorry. So this is the independent variable temperature and that's moving this as consumption.
And we're going to see it in a lot of people are visual learners. So you basically are going to see it in a second, but that's the slope of the line. And with every unit of temperature, it's increasing.
the consumption by 0.0031. And that is very, very important. And the standard error is basically asking, you know, you see this in a lot of data on polling.
It's like, if I continue to do, right, a sample of this ice cream and temperature, you know, what is, you know, what will it be? Right. Because remember, we're only dealing with a sample of the population. So this is saying it's very low.
So that means that this sample, according to the algorithm, is very, very accurate because this is a low standard error. Any coefficient, any number with the word error in it, as you see here, residual standard error, which I'm going to talk about, is better fit for your model if it is lower. So the lower.
the better. This is very, very low. So it's saying, you know, if we keep on doing random sampling, it'll be basically the same, you know, around 0.0004779 off, which is very low.
The t-value, you do not need to know math for this, the t-value is basically measuring the extent to which These numbers are moving. The higher the T value, the better, because that gives us what we call our P value. The P value is important because anything under 0.05 in, I believe, medicine, national security, just a broad spectrum of different academic and policy disciplines is considered, quote unquote, statistically significant.
I like to think of it as when you get 0.05, which is hard to do anyways, it's saying you need this independent variable, the cause temperature, for the dependent variable, this in the case of consumption up here, to move forward or back depending on price of temperature in this case. So It's essentially saying under 0.05, right, this is statistically significant. And as you come down here, this multiple R squared is the same as the correlation we did before, saying how much is being explained by our data.
So this is the, you can't get a perfect one with this because it is R squared. squared and since it's squared you can't get a perfect one uh but the closer to one the better fit your model i don't say notice you want closer to one because the problem is a lot of scientists say that but yet if you're doing research on say the covid medicine you're not supposed to want anything you're supposed to let the data fall where it means if you're doing research on terrorism which we we are going to do You're supposed to let the data fall that you made. There's a lot of corruption in statistics.
Residual standard error, right? Again, the lower the better because this is measuring the extent to which your predicted values are varying away from what we call the best fitting line, which you're going to see in a minute. And the degrees of freedom is the amount.
of data points you have, we have 30 minus your independent dependent variable, which is 30 minus two is 28. And this p value, right, is very, very low, because that's a scientific And when R puts, R studio puts three stars, it means it's statistically significant. So there's a good function in this, right? It's called format. You put the number here, scientific equals false. Get a little false up there.
So look at that number. That's without the scientific E. So it's 1, 2, 3, 4, 5, 6, and then 7. That's 7, including that 4, right?
4, 7, 9. So that is pretty strong relationship. Anything under 0.05 in this p-value, that's the probability value of it's happening, without the independent variable is very low. So think of it like that.
the lower the better since it signifies you need temperature for consumption to move. And we're going to apply this to national security and law. Now, a lot of people are looking at this, right? These are the numbers. And say, wow, that's really might be difficult.
I don't know if I can do this. It's actually all trial and error. And it's actually quite easy.
Because you just have to remember, this is our practical value. But right now, this is our practical value. This is our standard error.
The lower, the better, because it's saying with random samples, you're getting the same numbers. This is our t-value, indicating the movement away from the standard deviation of these numbers. So it's moving.
Something's going on here. And that gives us our p-value. For t-value, the bigger, the better. And then the p-value, the lower the better. So under 0.05, they say it's quote-unquote statistically significant.
Residual standard error, as you're going to see, I'm going to show you on the graph. But this, the closer to 1 better, 0.6 is very good. So what do we do? This will be, we're going to graph it, and I'm going to show you because we're going to be graphing a lot.
And that's kind of a cool thing because it's going to be important for a... broad range of types of work, academic policy disciplines. So what you do is you install packages here. Now you have to respect this because people stay up all night, all days, all years, et cetera, and never, you know, get married, have kids to do these packages.
These are all free. So the one we're going to use, you do quotes, right? It's called plotly. That's an interactive Gigi plot that has been very, very, um, Ooh.
I'm just going to say no. Our warning package is and will not be installed. All right. So let's take a look. Nope, it didn't get.
So I better do yes. So let me go back up. I guess it was updated. And I'm going to say yes. And now it's backing up our session.
See, this is what you get sometimes with this. But now I'm going to have to go back and make sure ice cream is installed. So it's restarting our session.
Now let's take a look. Did Plotly come up? Let's do it again. So then after you install it. Like that, you get library and then plotly.
Here it is. That's a package of graphs. It's actually ggplot.
This is, see, it's called ggplot2. Why is this important? Well, what our, in our studio, doing Python and other more avant-garde research, because the goal here is we're going to do research, we're just using the ice cream as kind of a test run, is that... Um... This will allow us to make beautiful interactive graphs, right?
So on this, we just already had, right? So what have you done? We got the ice cream data set, right? Up here, consumption, income, price, temperature. We're not using income, price, temperature.
We attached it, ran like the mean of consumption, the mean of temperature, standard deviation of temperature. That's how much it... deviates from the average.
Correlation between these two variables, consumption and temperature, very high. And then we did a very advanced model. Use it called Pokemon. We call the Pokemon LM. That's the linear model.
Consumption, that's the command. So consumption is our dependent variable. That's the T of tilde.
Temperature is our independent variable. And the data we're using is ice cream. So we summary Pokemon. That's what we get.
So these are the different coefficients and essentially our output. This is saying, this is our practical value, the estimate, the slope is saying, you know, for every year of temperature, we're getting an increase of 0.003 of ice cream. And then I'm just going to go straight to the p-value because I'm going to explain everything else again, is that 0.05, which is this is under, anything under that is statistically significant. So you say, wow, there is a relationship between temperature and consumption. Again, as I said before, that temperature and consumption can be, you know, anything that can be temperature and gas prices, that could be temperature and vacations, right?
So temperature and going to the beach increases, you know, then when we start to get to national security and law, we're going to be applying this to things that are important to the policymakers doing real research. And that's what makes this very, very important, using large data, artificial intelligence, etc. So there's 30 points here in our guerrilla data set when we get to use that. There's thousands.
We're going to do even Trump tweets as an example. I'm going to explain that. But you've got to go look at consumptions, price, and temperature. That's our big data.
We're seeing if there's a relationship. And here you go. And then next, we're going to... plot it and also do price in order to see the relationship between these variables. Thanks a lot for listening.
And don't worry, we're going to get right to national security.