Transcript for:
Understanding Sampling Distributions and CLT 7 of 12

module 20 distribution of sample means 7 of 12 let's compare and contrast what we know about the sampling distribution for sample means and Sample proportions so we have uh the variables here in this column so we have categorical and quantitative right so one example of categorical variable is gender another example of quantitative is AG we currently have uh the parameters right P is for the population proportion mu is for the population mean Sigma is for the population standard deviation we have uh the statistics so P hat is the sample proportion xar is the sample mean we then have the sampling distribution so when we're looking at uh a proportion right we know that the center for the sampling distribution of sample proportions is going to be p and the center for the sampling distributions for sample means is going to be mu here we have the measure of spread and how to calculate that which is theun of P * 1 - P / n and then for the spread for the sampling distribution of sample means is going to be Sigma overun n for the shape when we're working with a proportion we know that it's going to be normal as long as the following conditions are met so we need to have that n * p is greater than or equal to 10 and we also have to have that n * 1 - p is greater than or equal to 10 when we are looking at the sampling distribution for the sample means the question is when will the distribution of the sample means be nor be approximately normal so here are some common features of the sampling distributions for proportional and for means the mean of the sampling distribution is the population parameter that's going to be uh the center for the sampling distribution for a uh for the sampling means and we have that the center iation of the sampling distribution is affected by the the sample size with the statistic from larger samples having less variability under certain conditions the shape of the sampling distribution is normal we now investigate the conditions that guarantee a normal sampling distribution for sample means shape of the sampling distribution of means when we discuss the sampling distribution of sample proportions we learned that this distribution is approximately normal under certain conditions right we had to check that these conditions were met in order to um determine that the distribution was approximately uh normal so in other words we had a guid line based on sample size for determining the conditions under which we to use a normal curve to do probability calculations for sample proportions now let's go ahead and investigate these quantities when will the distribution of sample means be approximately normal right so this is the question that we are currently trying to answer does it depend on the size of the sample what happens if the distribution of the variable in the population is heavily skewed the following simulation video helps us investigate these questions finish our discussion sample means in particular we will investigate these two related questions what conditions guarantee a normal distribution of sample memes and what is the impact of sample size now from our previous investigations we know that when a variable is normally distributed in the population sample means will also be normally distributed so for this investigation we will look at a population with a skew here we have the commute times for individuals living in a small town in Florida this data comes from the 2,000 US Census we can see that commute times are skewed to the right with a mean of 25 minutes and a standard deviation of 20 minutes in the graph I have marked the mean and a standard deviation below and above the mean I will do this in all the distributions in this movie to give us a quick visual representation of citer and spread but my dialogue is going to focus on the shape now in the first sampling I'm going to use a small sample size of five I am doing this to see whether or not the sample means will be normally distributed even for small samples or whether the distribution will look more like the distribution of commute times for individuals in the population and be skewed now before I animate the simulation let's just double check that we understand what we see on the screen here you will see each individual sample each sample will contain five individuals who've been randomly selected from the population the Blue Line will indicate the mean of these five individual commute times and you will see the sample mean recorded in blue each sample mean will be graphed above and in this way we will generate the sampling distribution of sample means so I'm going to animate the simulation now but it's going to go pretty quickly since we're used to this kind of thing I want you to notice that the simulation is running two important things that I've already mentioned each sample has five people in it and for each sample we're calculating a sample mean sample means are being collected above and in this way we're generating the sampling distribution now pause the movie and do two things one predict the shape of the sampling distribution and two check your understanding of Concepts presented previously by predicting the mean and the standard deviation of the sampling distribution let's see what happened did you predict correctly it looks like the sample means when we use five people per sample have a skew to their distribution we also can see that the mean of the sample means should be about 25 it actually was 24.6 when I calculated it the standard deviation of sample means should be about 8.9 according to Theory it was about 8.7 when I actually calculated it now we're going to see what happens as we increase the sample size all right now we're going to increase the simple size to 30 individuals again I'm going to animate the simulation pretty quickly what you will see is groups or samples of 30 individuals being collected for each sample again we are calculating a sample mean and Sample means are graphed above in this way we're going to generate a second sampling distribution of sample means the difference is of course each sample now is larger each sample contains 30 people now pause movie again and we want you to do what you did previously we want you to predict the shape of the sampling distribution and also check your understanding of previous concepts by predicting the mean and the standard deviation let's see what happened I collected thousands of samples and each sample had 30 people in it you can see by looking here at the graph that the sample means are again centered at 25 and the standard deviation has decreased to only 3.7 here's the actual standard deviation and being calculated from the sample means now of course what we're interested in is the shape notice that the shape looks normal here I've graphed the same sampling distribution but I've spread out the X AIS so that we can she see the shape better I have used a mathematical model to graph the normal curve and we can see that the normal curve fits the sampling distribution well in fact a normal curve looks like it would be a good probability model here the area under the curve gives a pretty good approximation of the frequency of sample means what relief we've seen that if we increase the sample size to 30 sample means will be normally distributed discussion of the simulation are you surprised that a variable with a skew distribution in the population can have a sampling distribution that is approximately normal this discovery is probably the single most important result presented in introductory statistics courses it is so important that it has a name the central limit theorem the central limit theorem says that for large samples the sampling distribution of sample means is approximately normal this theorem is important because inference procedures such as hypothesis tests and confidence intervals are based on a normal model for the sampling distribution the central limit theorem assures us that we can use a normal probability model for sample means without knowing anything about the shape of the distribution of the variable in the population all we have to do is collect large samples how large a sample size do we need to assume that sample means will be normally distributed it really depends on the population distribution as we saw in the simulation the more skew the distribution in the population the larger the samples we need in order to use a normal model for the sampling [Music] distribution the general guideline is that samples of size greater than 30 will have a fairly normal distribution regardless of the shape of the distribution of the variable in in the population but if a population is strongly skewed it is safer to use a large larger use larger samples so here is the summary we are now ready to complete the table comparing the sampling distributions for proportions and for me means by describing the conditions for a normal shape so when we are working with the sample means of the sampling distribution of the sample means we can say that the shape of the distribution for the sample means will be normal as long as n is greater than 30 always normal if the population is already normal in summary for large enough sample sizes the sampling distribution of means is approximately normal even if distribution of the variable in the population is not normal if a variable has a skew distribution for individuals in the population a larger sample size is needed to ensure that the sampling distribution has a normal shape the general rule is that if the sample size n is more than 30 then the sampling distribution of means will be approximately normal if the variable is normally distributed in the population then any sample size will produce a normal shaped distribution