Understanding P-Values and Simulations

hi everyone this is Matt to show with intro stats today we're continuing our discussion of p-value so we've been working through sort of learning some of the basics of how a hypothesis test works last time we in the last video we saw that sort of the ideas of p-value on how to read it how to compare it to your significance level but now I want to talk a little bit about how do you even calculate this thing how do you calculate this p-value last time we talked about the definition of p-value being so vital so it's the probability of getting the sample data or sample statistic or more extreme by sampling variability if the null was true so we're going to unpack this definition a little bit because this is really gives us a blueprint in a lot of ways of how p-value is actually calculated so there's sort of two main ways that p-values are calculated we have traditional approaches and we have sort of the more modern approaches to calculating p-value I think I'm going to start with the more modern one and we sometimes call this randomized simulation or a randomization a randomization approach so a randomized simulation or simulation or randomization so think about this way the p-value definition says that if the null was true what would we be likely to see by sampling variability all right that's a key two key components to this we have to see what sampling variability would look like if the null hypothesis was really true so in a simulation what's gonna happen is they're gonna sort of create sort of a simulated sampling distribution in a lot of ways but it's under the premise that the null hypothesis was true so if the null hypothesis was true what kinds of random samples would I be likely to see from that population where the no hypothesis was true so if the null hypothesis is true kinds of random samples would we get just by sampling variability that's kind of the key idea behind a randomization approach so let's look at this example and we'll kind of flush this out a little bit and it can kind of get the idea a little bit and obviously this is all stuff that's really going to translate to the computers we're going to have the computer programs do this I like to use stat key for randomized simulation I think it's fabulous it works great so if we look at us let's suppose we had a null and alternative hypothesis this is an example we looked at before that normal body temperature is 98.6 degrees but nowadays some people are thinking that maybe Multan or my body temperature is actually less than 98.6 degrees Fahrenheit so the so again what the computer is going to do is it's going to assume the null hypothesis is true so that's gonna assume that normal body temperature really is 98.6 and then it's going to start drawing random samples or creating random samples under the premise that 98.6 is the population mean average so we're assuming the population mean average is 98.6 degrees Fahrenheit and then we're trying to see what kinds of random samples would I get so the computer would be taking random samples usually as it'll have be the same sample sizes what your ever your originals a real sample data would be and then it would calculate the statistics so whatever statistic you happen to be measuring so we're dealing with the mean right so obviously the one we want to compare it to is the actual sample mean so the real sample data we got a sample mean that's what the x-bar of ninety eight point two six right that's the sample data so this is a summary of what really happened now again what the computer is going to do is it's going to simulate random samples under the premise 98.6 is the population mean so notice you'll start to get sample mean so it's gonna take lots and lots of random samples it's gonna calculate tons and tons of random samples I usually when I do stacking I do thousands of these and and then you so if you look at think of each of these dots here as a sample mean so these are all sample means that were created by the computer and it's showing us what the sampling variability looks like for sample means in a lot of ways this is like say it's like a simulated sampling distribution under the premise that the null hypothesis was true notice how the center of the simulation is 98.6 because that's what I was assuming in the null hypothesis I remember when we learned about sampling distributions before the center is usually the population value well in this case the population value that you're assuming in the null hypothesis so our population mean we're assuming in the null hypothesis is the center of the distribution but then now I can see that even though the center is 98.6 I got simulated sample means that are you know anywhere from ninety eight point nine all the way to ninety eight point two okay so lots of sampling variability here so in a sense this graph is what we expect to see by sampling variability if the null hypothesis was true right going back to this definition now now that we have that we can try to see well what's the probability of getting the sample data sample statistic or more extreme by sampling variability okay so the sample statistic they're talking about the real sample statistic so what what really happened our real sample data gave us a sample mean of ninety eight point two six degrees Fahrenheit that was the real one so the thing about simulation is you'll have you know hundreds or thousands of simulated sample means but they'll only be one real sample mean or you'll see thousands of simulated sample percentages but there'll be only one real one so no lose track of which one was your real sample mean so the p-value is really this right the the probability of getting this sample mean ninety eight point two six but then it has this other part or more extreme so let's kind of look at this so this is about two hundred and fifty samples here dots on the board here each dot is representing a simulated sample mean from from us a random sample under the premise that the null hypothesis was true now if we find ninety-eight point two six here's that cutoff ninety-eight point two six now the computers gonna look now if you notice this was a less than was our alternative hypothesis so this would be a left tailed test that's really important in your calculations why you need to know if it's a left tail or right tail or two tails remembered less than points to the left in our H a so that's a left tailed test so now we're going to look at are there any times any simulated sample means that were either ninety eight point two six or less than ninety eight point two six and there actually was two so I got to sample meet simulated samples where the sample mean dropped below ninety eight point two six there might not be any that were exactly ninety eight point two six but I had two that were lower because think of it this way if ninety point two six is significantly different than the population mean ninety eight point six then anything farther out would be even more significantly different so I need to incorporate any simulated samples that disagree with the null hypothesis even more than my original sample data does so it's not just the cutoff point it's also anything else in the tail and that's what they're referring to here of or more extreme in the definition of p-value so if I had two hundred and fifty total random samples simulated samples and two only two of them were ninety eight point two six or less right the sample the sample mean or less then my peep would be approximately 2 out of 250 or point zero zero eight that would be my simulated p-value okay so probably this is the more most direct way to find the p-value if you're talking about really getting at the definition of this directly okay let's try it up look at another one here now we're looking at a proportion simulation the null hypothesis was PI equals 0.5 and the alternative was PI or P is not equal to 0.5 so this would be a two-tailed test right well you don't we talked about that before this would be a two-tailed test because it's not equal so that's going to be very important in your KP value calculation you have to consider both tails now not just the right tail not just the left tail but to both tails now it's really important what was my real sample data my real sample data gave a P hat a sample proportion of 0.61 now what the computers doing going to do is they're going to count create lots and lots of random samples under the premise that the null hypothesis is true so the computer assumes the null hypothesis is true and then it creates lots of random samples based on that premise that the null hypothesis is true and now what you get is again like this simulated sampling distribution notice that the center of the sampling distribution is pretty close to 0.5 that's because that's what the computer is assuming is true when it created all these samples now each dot up here they they basically calculated a random sample and then they calculated the P hat they calculated the sample proportion for that random sample so each one of these dots is a simulated sample proportion think of it like a P hat so I got 250 P hats on the board here that the computer created but there's only one real one what really happened that would be this one the P hat that really happened was point six one and that's where that's where the p-value is going to come into play because the p-value wants to know what's the probability of getting this real sample data this real sample P hat or more extreme again this is sampling this is a graph of sampling variability if the null hypothesis was true so now I'm just got to think about okay well where is 0.61 on this graph since that's my real p hat well point 61 is about right here now remember we have to find the probability of getting the sample statistic or more extreme right so I have to think anything that disagrees with the null hypothesis even further away from the null hypothesis so not just the dots that were at 0.61 but all of the p hats that were more extreme than point 6 1 than their moose farther away from the middle then the point 6 1 is if I counted them all the dots that came into the tail seems like there was about 48 of them so 48 divided by 250 would be point 192 so about 48 dots here you know remember this was a if this was a right tailed test this point 192 it would actually be my p-value but it's not it's a two-tailed test that means I have to incorporate also any simulations in the other tail that are also equally far away from the null hypothesis and others from the center of the distribution so if I look at all the the these simulated P hats on the in the left tail that are just as far away and counted them I'd also get about 48 out of 250 or 0.192 so how do I get my p-value now well you have to basically incorporate all of them so

Transcript for:Understanding P-Values and Simulations

Transcript for:
Understanding P-Values and Simulations