In this video, I'll talk about confidence intervals. In particular we'll discuss the concept of a confidence interval, how to construct a 95% confidence interval, how we can go about changing the confidence level, and we're going to talk about interpreting confidence intervals. Let me know if I'm saying "confidence interval" too few times. ;) A Confidence Interval interval is motivated by a particular issue that we have with a point estimate. If we're using a point estimate, it's kind of like fishing in the murky lake with a spear if we only report a point estimate, "p-hat". We probably will not hit the exact population proportion. A confidence interval is like casting a net. If we report a range of "plausible values", we have a pretty good shot at capturing the parameter. Here we're going to talk about how to construct a 95% confidence interval. In the normal distribution, 95% of the observations are within 1.96 standard deviations of the distribution center. Using this principle, if a point estimate can be modeled using a normal distribution. We can construct a plausible range of values for parameter by looking within 1.96 standard errors of the point estimate. So our general formula is, "point estimate ± 1.96 * [standard error]". For the proportion context, that's the sample proportion (p-hat) plus/minus (±) 1.96 times the formula for the standard error for a sample proportion. This is called a 95% confidence interval for the parameter "p". Let's consider an example. Pew Research found that 88.7% of a sample of 1,000 U.S. adults supported expanding the use of solar energy. We'll construct a 95% confidence interval for the population proportion based on the sample. The first thing we should do is check conditions. We did this in an earlier video for this poll, so we're going to skip this step here. The next step is we'd want to compute the standard error for the point estimate, which we also did in an earlier video and it was 0.01. Next, let's construct our confidence interval. We have our sample proportion ("p-hat") plus/minus (±) 1.96 times the standard error. Plugging in our values 0.887 ± 1.96 times 0.01. This gives us a confidence interval of 0.8674 to 0.9066. We interpret this as, "we are 95% confident that the actual proportion of American adults who support expanding solar-power is between 86.74% and 90.66%." In a 95% confidence interval, 95% is the "confidence level". Sometimes we want to use a different confidence level, such as 90% or 99%. Do you think a 90% confidence interval will be wider or slimmer than a corresponding 95% confidence interval? Think back to our earlier analogy with trying to catch the fish. If we don't need to be as confident that we'll catch a fish, a smaller net will typically do. This means a 90% confidence interval will be slimmer than a 95% confidence interval. So how do we figure out how wide our confidence interval should be, and how might we construct a 99% confidence interval? Let's think back to where the 1.96 came from. Earlier, when we were constructing the 95% confidence interval, we found that -1.96 to +1.96 captured 95 percent of the normal distribution. Similarly, we're going to look for values, in this case -2.58 to +2.58, which capture 99% of the distribution. This means that for 99% confidence interval, we'd use a point estimate plus/minus 2.58 times the standard error, if our point estimate is normally distributed. As a general rule for changing the confidence level, if a point estimate closely follows a normal model with a standard error "SE", then a confidence interval for the population parameter is the point estimate plus/minus Z-star times the standard error, where Z-star corresponds to the confidence level selected. So for a 95% confidence interval, that was 1.96. For a 99% confidence interval, it was 2.58. Let's consider another example. In the Pew Research poll about solar energy, they also asked about other forms of energy, and 84.8% of the 1000 respondents supported expanding the use of wind turbines. So is it reasonable to model the proportion of US adults who supported expanding wind turbines using a normal distribution for this poll? Let's check our conditions. First, we check independence: because the data come from a simple random sample, independence is satisfied. Next, we check the success/failure condition. Here we'll check it with the plug-in principle using our observed proportion: 0.848 times 1,000 is 848 -- that's certainly greater than 10 -- and 0.152 times 1,000 equals 152, which is also at least 10. With both conditions satisfied, we can use the normal distribution. To model the sample proportion here, we'll create a 99% confidence interval for the proportion of Americans support for expanding the use of wind turbines for power generation. The point estimate is 0.848, and next we need the standard error. We use the plug-in principle to compute this standard error by putting the sample proportion in place the population proportion in the formula. We get a value 0.0114. Next, we can construct the confidence interval itself: 0.848 ± 2.58 * 0.0114. This gives us a confidence interval of 0.8186 to 0.8774. This means we're 99% confident that the proportion of American adults that support expanding the use of wind turbines is between 81.9% and 87.7% In each of these examples that we've talked about in this video, we described the confidence intervals by putting them into context of the data and also using somewhat formal language -- "we are 95% confidence that the actual proportion of American adults who support expanding solar power is between 86.7% and 90.7%." and "We are 99% confident that the proportion of American adults that support expanding the we use of wind turbines is between 81.9% and 87.7%. Confidence intervals are only focused on capturing the population parameter. The confidence interval says nothing about trying to capture individual observations. It also says nothing about future samples. So what does "95% confidence" or "99% confidence" actually mean? A confidence level describes what would actually happen if we took many samples and constructed confidence intervals for each. The confidence level describes the fraction of those intervals that capture the population parameter of interest. While it's useful to think of a confidence level as a probability, there are technical reasons why it is not a probability, so never describe it as such. If you do on a homework or an exam or a quiz, it'll almost certainly get marked as wrong. To recap, we covered: The concept of a confidence interval, which we used as an analogy with fishing with a net instead of a spear. How to construct a 95% confidence interval; here we used a point estimate plus minus 1.96 times the standard error. We also discussed how to change the confidence level where we adjust 1.96 in the confidence interval formula to be a different value to correspond to the confidence level that was chosen. We also discussed how to interpret confidence intervals; we use the language such as, "we are 95% confident that ..." and so on, and we never use the word "probability" when describing a confidence interval. Always remember that the confidence interval only tries to capture the population parameter -- it does NOT try to capture individual observations, and it says nothing about future samples. The mission of OpenIntro is to make educational products that are free, transparent, and lower barriers to education. Each year, teachers who choose OpenIntro resources save students over $1 million. We hope you'll join us at openintro.org