Transcript for:
CLT Overview and Conditions

We are now going to start the second of overlapping topics from what we've already discussed, which is Central limit theorem, and we all know how important Central limit theorem is. I mean, literally, we utilize those Central limit theorem conditions over and over and over again throughout all of Chapter seven and in Chapter eight when studying sample proportions. And so, honestly, in the same way, Central limit theorem is going to be just as important when studying sample means. And so, once again, I'm going to do my best to emphasize what are the similarities in central limit theorem, regardless of whether we're looking at proportions or means. And then, of course, I'm going to really emphasize what are different. All right, here we go, Central limit theorem again. What is the best part about the Central limit theorem? It allows us to approximate a sampling distribution without having to do multiple simulations. That idea has not changed with Central limit theorem. The idea that if my sample meets some very basic conditions, I only need that one sample. I only need that one sample to represent what my sampling distribution will look like. Once again, if these basic conditions are met to show I have a good sample, what this tells me is some really awesome results about my sampling distribution. Like its shape will be normal. Like I will be able to calculate center and spread using those formulas that we just talked about. And so, what I just highlighted in yellow is emphasizing the overarching ideas about Central limit theorem that will hold regardless of whether we are looking at sample means and sample proportions. And ultimately, when it comes to looking at the Central limit theorem, once again, it's going to have a part one and a part two. Once again, part one is going to be checking conditions to ultimately determine if we have a good sample. Once again, part one is about making sure we have a good sample. And part two, once again, is going to be looking at the results, the results for my sampling distribution. Now, when it comes to part one and looking at the Central limit theorem conditions, the two conditions that are exactly the same are random sample and large population. I want to emphasize to you guys that conditions one and condition three are exactly the same that we saw for Central limit theorem back in Chapter seven and eight, simply making sure you're told the sample is collected randomly, done needing to make sure your population is at least 10 times the sample size, exactly the same as what we saw back in Chapter seven and eight. So, here's the first big difference. The first big difference for when we are looking at sample means and studying the Central limit theorem is the second condition. Because this second condition is now asking the question of normality. It's not large sample, it's asking for normal. Now, in the context of studying numerical data, studying sample means, what does it take to satisfy the normality, the second condition? Either the population has a known shape to be normal. So, yeah, that bell-shaped curve, it's known, it's given in the prompt to be normal. Or the sample size is greater than or equal to 25. And I want you to note, I said the word "either". I want you to note, I use the word "or". That word "either" or is emphasizing you only need one to hold. So, what does that mean? Well, it means that if in your prompt, all you were given, the only thing you are given is that the population is normal in shape, well, you're good to go, you've satisfied normality. Or if, say, the only thing we are given is that the sample size is greater than 25, well, you know what, that still satisfies the idea of "or", you only need one of those condition I to hold. Or, you know what, if both hold, if you're given both, the population is normal in shape and the sample size is greater than 25, great. When any one of these three options end up coming up, they all satisfy the normality condition, in which you're probably like, "Wait, this feels too easy, Shannon, feel like a trick, is it really just this easy, looking in the prompt and seeing if the word 'normal' was given?" Yeah, Shannon, is it really that easy, just finding the sample size and making sure it's bigger than 25? Yeah. Wait, Shannon, are you saying I only need one of them to hold to satisfy the normal condition, I don't even need both? Yeah. And so, while the normality condition is different, it's easier in a lot of ways. The Central limit theorem for sample means is so much easier. All right, before we move on to the results, I really want to drive home this idea of what are the similarities and differences. All right, again, when it comes to Central limit theorem, the similarities when looking at the conditions is that your sample will be random and that your population needs to be large enough. Those are the similarities when looking at Central limit theorem. So, the big differences here are going to be when we're studying sample proportions versus when we're studying sample means. Because ultimately, here in Chapter nine, when we are looking at numerical data, that second condition will be about checking normality, where you either need you are given that the population is normal in shape or, meaning you only need one or the other, the sample size is greater than 25, you only need one. Chapter nine's second condition is easier versus when we were looking at proportions back in Chapter seven, when we were looking at categorical data, that second condition was way harder. Checking for large sample required that both the number of successes be greater than or equal to 10, and you needed the number of failures to also be greater than or equal to 10. I wanted to show you guys this side by side so that you guys can clearly see what is the big difference. And honestly, this second condition is probably the biggest difference when it comes to the Central limit theorem. Now, when looking at the Central limit theorem, the results are, once again, going to describe the sampling distribution, once again, giving results about shape, center, and spread, where the shape will once again be normal. These are the similarities when it comes to the results of the Central limit theorem. The shape will still be normal, and you'll still be given information about the center of that normal curve and the spread of that normal curve. But that's pretty much where the similarities end, because in particular, here in Chapter nine, here in Chapter nine, when we are studying again sample means, when we're studying the Central limit theorem for sample means, that shape, that shape which is normal, will again have a center that is coming from the population, except now it is the population mean. The spread, the spread of that normal in shape sampling distribution will once again be standard error, except that standard error formula is a vastly different formula. It's the formula we've now learned in Chapter nine. And so again, the results of the Central limit theorem will have some similarities. It's the same three general ideas of shape, center, and spread. It's still the same shape. But what's different is how we actually calculate that center and how we calculate that spread. And so let me just do a quick side by side here where again what we're saying for sample means, what we're saying for sample means is that the center or the average of all of the sample means is going to equal the population mean. And for sample means, when it comes to looking at sample means, the spread is going to be the standard error formula of sigma divided by square root of n, where again sigma is population standard deviation. This makes a lot of sense that the center and spread of numerical data is going to use calculations of the mean and spread or the mean and standard deviation of my original population. Why? Because when it comes to looking at numerical data, center is explained by mean. When it comes to looking at numerical data, spread is explained by standard deviation. That's what we learned in Chapter three. And so it makes a lot of sense that when it comes to looking at the sampling distribution, the sampling distribution center and spread uses the population mean and standard deviation respectively, versus back in Chapter seven, the sample proportion, the center of the sampling distribution of all the sample proportions was ultimately the population proportion. So, in a lot of ways, the similarity here is that the center is using the population parameter. But spread, oh man, guys, when it comes to looking at spread, those standard errors could not be more different. The standard error for when looking at sample proportion, again, was that huge square root expression, which is totally, totally different from the standard error formula in red. And so, again, wanting to give you guys a side by side so you guys could see how Chapters seven and eight in green when we studied proportions, when we were ultimately looking at categorical data, is giving us vastly different results purely because we're looking at proportions with categorical data and means and standard deviations with numerical data. I've had multiple students tell me this page was really, really helpful to have on the final exam because the final exam is comprehensive. The final exam is going to include questions from Chapter seven and questions from Chapter nine. And being able to keep these different, well, differences straight was really helpful for students. And instead of having you reinvent the wheel, this page is pretty much a nice outline of how to keep things separate and different. And so now in Chapter nine, we're honestly going to ignore everything we learned specifically to categorical data. We're not going to worry about any of these ideas that are highlighted in green, and we're really going to focus on utilizing the Central limit theorem when looking at numerical data.