Transcript for:
Sampling Distribution Concepts

Remember, Chapter 3 was all about looking at numerical data and asking, "How can I summarize it?" We said you can summarize numerical data by using these three ideas: shape, center, and spread. So what this means is that we can describe this sampling distribution, literally this huge, huge pile of proportions, and we can organize these proportions and look at the shape, center, and spread of this data. Now, shape, let's talk about shape first. When it comes to looking at sampling distributions, the shape is going to be normal. Now, this is something we're just going to have to accept as fact. It's something that is proven using calculus-based statistics, and it's something that, frankly, you don't need to understand why, but I want you to understand the end result. And it's that when you're looking at a sampling distribution, its shape will be normal. And why is it important to note that the shape is normal? Well, let's go back and remember: normal means symmetric. Remember, normal means symmetric. And remember that when your data is symmetric, it means then that your center will be mean and your spread will be standard deviation. Let's go back and remember that when your data is symmetric, it means your center will be mean and your spread will be standard deviation. So the question then is: what is the mean of this sampling distribution? The question then is: what is the standard deviation of this sampling distribution? Well, that's where the key points are going to give us a hand again. What I'm about to present to you, I'm not going to explain the proof behind it. I'm not going to explain the why behind it because, again, it's going to require calculus, calculus-based statistics. But I at least can give you the result, which sometimes is all we want. And here's the first one: the mean. The mean of my sampling distribution, meaning if I literally took the mean of all the sample proportions, so let's go back to my dice example. Alright, let's say I take all of my sample proportions and I want to take the mean of them. So, that little p-hat next to it is saying, let's take the mean of all of the sample proportions that I found for each sample. So literally, I'm going to add them all up and divide by five. Can you guys find that number for me? For me, yeah, we're going to get 0.17. So what this rule is saying is that if you take all of your sample proportions and you take their average, add them up, divide by five, what you will see is that they are ultimately really, really close to being equal to the population proportion, P. Notice how the population proportion, the true proportion, the theoretical proportion of landing on a five is 0.167, and notice how freaking close those two are. So, so, so close. And so the idea is here it is, is that ultimately when you're looking at the mean of all sample proportions, it will be equal to the population proportion, P. Population proportion, P. That's going to be our first, our first, and in first of two important formulas that ultimately we can calculate the mean of all of those proportions as being pretty much equal to the population proportion, P. Guys, this is astounding because that's exactly what we want. We want all of these proportions to be spinning around, much like how planets are spinning around the sun, spinning around the population proportion so that if you average them all up, it gives you the population proportion. So we've taken care of mean. We've taken care of what is the mean of this sampling distribution. So then the second question is, what then is the standard deviation? What then is the standard deviation, the standard deviation of the sampling distribution? And I know there is so much terminology in this lecture today, but ultimately the standard deviation of the sample distribution is what we call standard error. Alright, remember earlier I mentioned to you guys we said error is bad, so we want to make it as small as possible. Well, that error is called standard, standard error because the error is the deviation so we add the word standard in front of it to emphasize. Standard error is simply the standard deviation for the sampling distribution, and ultimately it's going to be this formula here, the square root of P times 1 minus P divided by S. Now, real talk, we just created a formula to understand the mean of my sampling distribution, we just developed a formula to understand the standard deviation of my sampling distribution, and this is a lot to remember. And so that's where I want you guys to remember you're allowed a note sheet for the exam. So don't be a hero and feel like you need to memorize this formula for mean or this formula for standard error. Put it on your note sheet. Put it on your note sheet so that you're using your brain space for thinking about how I will utilize them. And so what's one of the things you can see in the standard error formula? What I want you to see is in the standard error formula, the fact that the sample size is in the bottom of the fraction. Why is it important the sample size, the sample size is at the bottom of the fraction? Well, ultimately, if my sample size, the bottom of the fraction, if my sample size ends up getting larger, notice it makes then the overall fraction smaller. We all know getting a third of a pizza is a way bigger slice than getting one 25th of a pizza, giving us the idea we said earlier, the larger the sample size, the smaller the fraction, the smaller the standard error. Now again, how this formula was developed, that was calculus-based, so don't worry about how we developed it. What I want you to do is one in blue, put that note sheet, put that formula on your note sheet, and two, see that because the sample size is in the bottom of the fraction, we are still satisfying the idea: the larger the sample size, the smaller the error. Lastly, the population size won't have an effect on the sampling distribution. So we could have been running a problem about a billion Americans. We could have been running a problem about a thousand sea lions. Ultimately, the size of the population has no effect on the distribution. But generally, these formulas we just developed are specifically designed for problems where the population is huge. And so a general rule of thumb we have is we just want the population size to be at least 10 times larger than the sample size. So really, the key point here is that these two formulas only make sense when your population is too big.