Transcript for:
Understanding Confidence Intervals in Statistics

module 16 introduction to confidence intervals 5 of6 connection to the theoretical sampling distribution and normal model for inference procedures we work from a mathematical model of the sampling distribution instead of simulations but we always begin our discussion with a simulation to highlight the sampling process simulations also remind us that the sampling distribution is a probability model because the sampling process is random and we look at long run patterns recall from distribution of sample proportions our discussion of the mathematical model for the sampling distribution of it of sample proportions for samples of size n the model has the following Center and spread both of which are related to a population with a proportion P of successes so for the center mean of the sample proportions is p the population proportion for the spread we take a look at the standard deviation of the sample proportions also called the standard error and we find it using using this formula here shape a normal model is a good fit for the sampling distribution if the expected number of successes and failures is at least 10 we can translate these conditions into a formula so in order for us to be able to use this normal model we need to make sure that these two things are being satis ified if we if we can use a normal model for the sampling distribution then the empirical rule applies right so it's only if we can use a normal model then we can apply the empirical [Music] rule recall the empirical rule from probability and probability distributions which tells us the percentage of values that fall one 2 and three standard deviations from the mean in a normal distribution so from the empirical model we know that 68% of the values will fall within one standard deviation of the mean we know that 95% of the values will fall within two standard deviations of the mean and we know that 99.7% of the values will fall within three standard deviations of the mean and so here is a um visual um normal curve so you could visualize um one standard deviation two standard deviations and three standard deviations so here we have um one standard deviation from the mean and then we have right here we have two standard deviations from the mean and then here we have three standard deviations from the mean when we have a normal model for the sampling distribution the mean of the sampling distribution is the population proportion these ideas translate into the following statements 68% of the sample proportions fall within one standard error of the population proportion 95% of the sample proportion fall within two standard errors of the population proportion and 99.7% of the sample proportions fall within three standard errors of the population proportion therefore the empirical rule tells us that there is a 95% chance that the sample proportions are within two standard errors of the population proportion a margin of error equals two standard errors then will produce an interval that contains the population proportion 95% of the time in other words we will be right 95% of the time 5% of the time the confidence interval will not contain the population proportion and we will be wrong right so 95% of the time we will be correct and the other 5% that is when we um will not contain the population proportion we can make similar statements for the other confidence intervals but these are less common in practice for now we focus on 95% confidence level with the formulas for the standard error we can write a formula for the margin of error and for the 95% confidence interval so this is what we saw previously right so sample statistic plus or minus the margin of error here the margin of error is equal to two standard errors so if we're looking at a formula we have P hat right P hat is for the sample proportion plus or minus 2 times the standard error right which is uh the formula that we saw right here how to compute the standard error remember that we can make a statement about our confidence that this interval contains the population of portion only when a normal model is a good fit for the sample and distribution of the sample proportion so you need to make sure that the normal model is a good fit how do we do that well you need to make sure that these conditions uh right here are met right you need to make sure that these conditions are met okay comment you may realize that the formula for the confidence interval is a bit odd since our goal in calculating the confidence interval is to estimate the population proportion P yet the formula requires that we know P for now we use an estimate for P from from a previous study when calculating the confidence interval this is not the usual way statisticians estimate the standard error but it captures the main idea and allows us to practice finding and interpreting confidence intervals later we'll estimate the standard error using p hat as we did in the previous quiz using p hat to estimate standard errors is commonly used in statistical practice example overweight men recall the use of the data from the National Health interview survey conducted by the CDC to estimate the prevalence of certain behaviors such as alcohol consumption cigarette smoking and hours of sleep for adults in the United States in the 20052 2007 report the CDC estimated 68% of men in the United States are overweight suppose we select a random sample of 40 40 men this year and find that 75% of them are overweight using the estimate from the survey that 68% of us men are overweight we calculate a 95% confidence interval and interpret the interval in context so we first have to check normality conditions so checking the normality conditions we could see that yes the conditions are met the number of expected successes and failures in a sample of 40 are at least 10 we expected 68% of the 40 men to be overweight so here we have n * P which n is my sample size of 40 p is my probability of success right in this case is 68% so we get that this is about 13 we can use a normal model to estimate the 95% that 95% of the time of the time a confidence interval with a margin of error equal to two standard errors will contain the proportion of overweight men in the United States this year so we go ahead and calculate the standard error using the formula so calculating the standard error is the square root and inside of the square root we need to input this fraction so 68 * 32 / 40 we take the square root of that and we approximately get 074 find the 95% confidence interval and this is where we use sample proportion plus or minus margin of error so we we have our sample proportion to be 75 we know that the standard error is what we just found here so we go ahead and do the calculation so our lower limit happens to be 602 our upper limit is going to be 0890 898 the last thing we do is we interpret we say that we are 95% confident that between 60 .2% and 89.8% of men of us men are overweight this year