What Was Ultimately the Point of Hypothesis Testing? Remember, the point of hypothesis testing was ultimately to make the right decision, and the right decision was to reject the null when the alternative was in fact true. The probability of doing that and getting it right is what we call the power of the hypothesis test. And ultimately, when we reject the null hypothesis, we get a result that is called statistically significant. Why is it important to have a statistically significant result? It's literally to emphasize that our statistical inference that we just ran worked. It actually did the job of giving us evidence. So ultimately, we need to ask the question: what does it take to ensure we get statistical significance? And so what we're going to do is we're going to look at an example where we're going to do, in essence, the same experiment. Let's read it. Most people assume that the probability of getting a head or a tail when flipping a coin is 50/50. We all agree with that. Now, some researchers believe that it's actually more likely to land on the same face as it started out on, with a probability of 51%. So let's suppose we do this experiment, and you get a sample statistic of 51% for the face you started on, assuming the conditions are met. Let's test to see if your statistic is significantly larger than that 50/50 ratio of getting a heads or tails. And what I want you to see is that we are, in essence, doing this problem twice. We're going to do this problem once when we have a sample size of 700 flips, and then we're going to do it again when we have a sample size of 7,000 flips. And I want us to see that when it comes to doing the same exact experiment twice but simply changing the number of flips, how will it affect the statistical significance? So let's do this. For starters, I want you to note that the hypothesis (step one of the process) is already done for us. We know that the assumption, the null, is that it should be a 50/50 split on which face you land on, and the alternative hypothesis is going to be believing more likely to land on the same face you started on. Step one already done. Step two is also already done. Step two is already done because we are assuming all the conditions are met. And you know what, guys? I also did step three for you. I typed in all of the important values into 1-PropZTest, and I found the P-value for you. So really what I want to focus on here is step four. What I want to focus on here is step four, comparing that P-value of 0.298 to the significance level of 0.05. Why don't you guys give me a hand with this one? When comparing this P-value of 0.298 to 0.05, which scenario do we have? Is P smaller, or is P bigger? Are we going to fail to reject, or are we going to reject the null? Which scenario do we have? Yeah, I can see here P is going to be the bigger value, which we already said. When P is bigger, we fail to reject the null. We fail to reject the null. And so if we're failing to reject the null, can you tell me which conclusion we pick? There is or there isn't enough evidence. There is not enough evidence. As a statistician, not having enough evidence is the biggest bummer because not having enough evidence means your result is not statistically significant. When you fail to reject the null and not have enough evidence, it means your result is not statistically significant. Well, ultimately, what this means is that the data you collected in this 700 flips, it wasn't enough. Not statistically significant is not disproving the null or the alternative hypothesis, simply saying non-statistically significant means you didn't do enough in your sample size. So what does a statistician do when they've done 700 flips and you get a result that's not statistically significant? Well, you do your whole experiment again. You write down your hypothesis, you check your conditions, and you do your calculations for the P-value. Except now, this statistician upped the ante, literally increased the sample size from 700 to 7,000. And ultimately found the new P-value. When you increased the sample size by 10 fold, the P-value then became 0.471. Notice that when the sample size significantly increased from 700 to 7,000, the P-value also significantly changed. And so once again, I want us to compare that. I want us to compare that to the same significance level. The significance level hasn't changed in this case. Now, with a sample size of 7,000, with a P-value now of 0.471, my question for you is: is the P-value smaller or bigger? Yeah, the P-value is smaller. It's smaller. And we said then that means we can reject the null. We can reject the null. And so when your P-value is smaller, when you can reject the null, what does that mean for my evidence? Is there or is there not enough evidence? Yeah, there is enough evidence. And remember, the idea of having enough evidence means your results are significant. Statistically significant. So this is what we want. We want to reject the null. We want enough evidence. We want a significant result. No, that the hypothesis didn't change, the conditions didn't change, even the calculations we did were exactly the same. We used 1-PropZTest, but yet the significance changed from part A to B. And the significance changed because of the sample size. I want you guys to see that in two ensure statistical significance, to ensure statistical significance what we need is to make sure my P-value is small enough. Notice what we need to go from not statistically significant to yay statistically significant is we needed to have a small enough P-value. And notice what made that change was going from a sample size of 700 to 7,000, meaning my sample size was large enough. In which if you're like, "Wait a minute, hold the phone, Shannon, you're saying even now in chapter eight when we're talking about statistical inferences having a large sample is still going to be important?" Yes, yes it is. Since chapter 1, we've all agreed the bigger the sample size, the better. And now, we're having more and more evidence, haum buum, more and more evidence showing why that the bigger your sample size is, the smaller your P-value will get. And that the smaller your P-value is, the much higher the chance you'll be able to reject the null, have enough evidence, and have a result that is statistically significant. And so at the end of the day, at the end of the day, what was the point of this example? It was to emphasize that if you want a statistically significant result, you've got to make sure that sample size is in fact large enough.