Confidence Intervals for Proportions

hi everyone this is Matt to show with intro stats we're going to continue our discussion now on how to calculate confidence intervals so last time we were talking about the idea of a confidence interval we said that it was two numbers that we think the population parameter might be in between so we're trying to figure out a population parameter and see if it's in between the topic of today how to calculate one population proportion confidence intervals so this is really important if you ever try to figure out a population percentage like the percentage of your customers or your business then they'll little little like one of your items or try to find some kind of percentage characteristic of a population this is what we use this for alright so one population proportion confidence intervals so we found out last time that the mate the formula we often use for this is the sample statistic plus or minus the margin of error again we're going back to definitions here we said the margin of error was how far off we think the sample statistic could be from the population parameter all right so that's margin of error very important definition but the question we didn't really address last time was how do you calculate that right well it really depends on the situation but if you guys remember in our study of the empirical rule right the empirical rule was a study about normal and normal sampling distributions and we started to looking at this idea that to standard if you look at a standard normal curve about two standard deviations away is 95% so if you go to standard deviations above and two standard deviations below then you get about the middle 95% and it it made sense that if your looking for the middle 95% with some kind of situation where you have a normal sampling distribution you might want to go with two standard errors away that's kind of where that came from so one formula that God adapted really quick was this formula sample statistic plus or minus two times the standard error remember we said standard error was the standard deviation of the sampling distribution but two standard errors would be about the margin of error and some people still use that formula sort of a shortcut easy formula for a margin of error two times the standard error so that sort of that was the formula for awhile but if you think about the number two the number two is how many standard errors away we need to be right empirical rule says it's about two for 95% okay well the number of standard errors is really a z-score right we talked about how z-scores are the z-scores have this idea of how many standard errors away so this to in a sentence could be thought of as a z-score so the formula kind of adapt it to z-score times the standard error and now we can use the z-score for 95 percent or 90 percent or 99 percent when we're calculating our confidence interval there was this z-score can change now in you can look on the previous video about how to calculate critical values but we did look them up in the previous video and we found that the critical value Z scores for 90% was 1.645 95 was 1.96 and for 99% was two point five seven six so even though the know the empirical rule is not wrong the empirical rule said that for 95% it's about two right well 1.96 is about two right it's pretty close it's just a little bitter accuracy so these are actually very famously scored very famous we all have these numbers memorized so let's go back to the formula here and see how we can calculate this now so I have the z-score now again before computers statisticians and mathematicians had to come up with ideas of how could I estimate standard error right so they came up with estimation formulas for standard error and since we're looking at proportions today percentages we're going to be using the sample proportion if you remember that's P hat so the formula that statisticians came up with was square root of P hat times 1 minus P hat over N I know it looks weird but it works actually it's very close to what a real sampling distribution would give you a so if we calculated a sampling distribution on stat key calculated the standard error from the sampling distribution it would come out pretty close to what this formula would say so this became the sort of standard formula that most computer programs if you click one population proportion confidence interval this is the formula they're pre-programmed to actually calculate for you they'll put in the z-score they'll calculate P hat and I'll put it in usually just need the amount of successes out of the total number of trials and then they'll calculate this for you but I'm going to go ahead and calculate this now I'm going to do an example here now again this is not something that you do in real life for the most part we always have computers do this this is more about just understanding what is the computer doing right so again I just doesn't mean I want you calculating like this all the time I want you to get comfortable using technology to calculate so let's look at an example so we're gonna look at this example here we have a sample of COC stat students from the candy country campus and we had a total of 108 in the sample and four of them smoked cigarettes and my my op my she was well what percentage of Kenyan countries students actually smoke cigarettes well again I could calculate the sample proportion well let's do that 4 divided by 108 is about point zero three seven but does that mean that the population percentage is point zero three seven I hope you said no right marry our discussion last time when we introduced confidence intervals is this number is going to be off from the population percentage there's gonna be a margin of error with this right so we know that point zero three seven is probably not the actual population percentage so what could the population percentage be that's the big question and that's what the confidence interval is gonna answer for us so I'm gonna do a sort of plug in these numbers I got P hat was point zero three seven now I do need to know what my z-score was my critical value z-score for ninety percent confidence where we look these up with stat key 90 percent was one point six four five by the way the plus or minus kind of connects with this plus or minus in the formula here so you can really just put in one point six four five the plus or minus is taken care of in the formula so so I'm chaining replacing this Z with 1.645 the P hat with point zero three seven the n is the sample size so one hundred and eight people and there's my original there's my P hat now it's just a matter of crunching some numbers if you take the square root of point zero three seven times one minus point zero three seven divided by 108 you get point zero one eight about point zero one eight one six okay that is the standard error what I always like to do when I'm doing stuff like this is I'd like to know what was the standard error so the approximate standard deviation of the sampling distribution is about point zero one eight so I'm going to write that down right here by the way that would be about one point eight percent if I was thinking about it as a percentage okay all right now I'm gonna go ahead and multiply one point six four five times the standard error and that's gonna give me the estimate of the margin of error okay so really the most important formula up here right now is really this idea that it's the sample statistic plus or minus the z-score times the standard error that's sort of the or the critical value times the standard error that's probably the more important formula up here so you have the idea in your head about how is this calculated and what is the computer doing all right so if we multiply these together we get the margin of error right here point zero three zero so I like to make a note of that the margin of error in this case was point zero three zero so we have roughly about a three percent margin of error in this case again the smaller the sample size the bigger the margin of error okay so not surprising we have quite a big margin of error when we have such a small sample so if we look here if I now you're just going to do adding and subtracting so if you get the confidence interval you're just gonna take the sample statistic in this case the sample proportion point zero three seven plus or minus the margin of error zero three zero so point zero three seven minus point zero three zero point zero zero seven we said last time this is called the lower limit of the confidence interval adding them point zero three seven plus point zero three zero we get point zero six seven that's called the upper limit of the confidence interval so if my sample proportion was point zero three seven that doesn't tell me the population is point zero three seven it tells me that I think the population percentage could be anywhere from point zero zero seven two point zero 6:7 in other words box writing is as percentages right I can write think of that as 0.7% and 6.7% all right if I was thinking of those as a percentage there we go okay so I'm 90% confident remember we use the 90% confidence level that the population percentage of stat students at candy country that smoke is somewhere between 0.7 percent and 6.7% all right that's the main idea with this now the question would be remember it's not about calculating this right the computers are going to calculate this what I really want to know is is this accurate how accurate is this formula that's what you really want to focus on and being able to explain it so the accuracy of this formula is really tied to z-scores and standard air both of whom come from normal sampling distributions in other words we'd have to have a relatively normal sampling distribution for sample proportions so if we're thinking about that remember our study of sampling distributions for proportions we needed at least 10 successes and at least 10 failures for the sampling distribution to look normal that gives us over here two assumptions every kind of inferential technique often has assumptions that are tied with telling you when that formula is accurate and when that formulas not accurate so we wanted a random sample or representative of the data we want our individuals to be independent of each other we want at least 10 successes and at least 10 failures so in this case we definitely have at least 10 people that didn't smoke in fact we had 108 minus 4 is 104 but we did fail this one right we failed the at least 10 6 says okay

Transcript for:Confidence Intervals for Proportions

Transcript for:
Confidence Intervals for Proportions