Transcript for:
Handling Unknown Population Proportions

Now, there's only one thing here that we're all probably feeling a little uncomfortable with, and it's that Shannon, Shannon, you, you told me that the point of even gathering this sample was to understand the population. But yet, Shannon, in these results, in these results we are literally using the population proportion. We're literally using the population proportion, p, in these formulas. And you know, Shannon, we had an example where you gave me that population proportion, but what you're telling me is that I want to use a good sample to understand my population, which is normally unknown. So, Shannon, how does this work? How does it work if the population proportion p is an unknown value? What do I do with that? And that's where we get to the next page. Again, more often than not, you're actually not going to know the value, people. More often than not, that's actually what we're trying to understand. And so, the question is, what do I do then? What do I do if I don't know the value of p? And that's where we're going to go back and remember the point of the conditions. Remember the point of the Central Limit Theorem conditions was to establish we made a good sample. Remember the point of the Central Limit Theorem was to emphasize I made a good sample, again, why? Because I would have already proven the sample satisfies all the conditions of the Central Limit Theorem. The point of the conditions of the Central Limit Theorem are to prove to us that we have a good sample. Why? Why is that so important? Well, the fact that I have a good sample then means I can substitute the value of p . Meaning, if we go back to our formulas above, instead of using p, I will now use the sample proportion, p-hat, to find my center. Instead of using p, I will now use the sample proportion, p-hat, in my formula for standard error. Why is this powerful, guys? Why is this powerful? Well, because it means then that the center and the spread are then calculated purely using information only from the sample. I want you to see here that the center and spread are now calculated using p-hat which is the sample proportion. Notice how n is the sample size. Notice how those are literally the only variables, only variables in each of these formulas. And what I want you to see is they both come from the sample. And the idea is that because we have a good sample, we then can use info that only comes from the good sample. Again, why is this so powerful? It's because what we're doing is we are refining my sample and making sure it's good enough. Making sure that sample is, in fact, good enough that it will, will in fact represent my population. And so, the reason why I need to emphasize this is that this is really driving home the fact these Central Limit Theorem conditions are critical. These Central Limit Theorem conditions are literally the ability to say can I move forward or not? And so, knowing how they are satisfied is incredibly important. So again, let's go over these conditions one more time. Again, the three conditions of Central Limit Theorem are random, large sample, and large population. Now, to completely level with you, randomness is simply just looking somewhere in the prompt that it says the sample is randomly collected, right? In this particular statistics class, that's all I need you to do to satisfy the sample was collected randomly. It literally just needs to tell us that. And the reason for that is because random sample is really going to be about how researchers go into the field and gather the data. And that's what you're going to do if you use statistics in your future major or career. And that's going to be up to you to do it right. And so, ultimately, for the context of this class, randomness is practically going to always be a given of yes, honestly. Similarly with large population, any population we're going to study from here on out is going to be very large. We're still going to need to check that it is, in fact, 10 times the sample size, but pretty much any population we're going to study is going to be a large population. So, in a lot of ways, conditions one and three are really easy to identify. The condition that's honestly going to be the hardest is large sample, and honestly, large sample is the condition that more often than not doesn't hold. Large sample is the condition that usually gives everyone trouble. And so, because of that, it's really important to understand how to identify if my sample is large. All right. And so, what we're going to do in the next couple of examples A and B is ultimately discuss how do I calculate large sample. That's what we're going to focus on for the next two examples. We're going to focus on how do I determine if my sample is large enough because honestly, condition two is the weirdest of them all.