Transcript for:
Understanding Sample Size and Margin of Error

Hey there, we're on section 10.C, in-class activity 10.C, and we're going to talk about sample size. And today, we're not going to be using algebra, we're going to be using technology. So a little bit of background for algebra, so bear with me on that. Okay, let's get started. Well, I think you are looking at my, I hope you're looking at my worksheet. And so in class activity 10C, and we're looking at sample size for calculations for proportions. So that's what we're looking at. And we're going to use technology. So let's get started with a warm up. I hope at this point, you are seeing statistics everywhere, more so than you ever did because you're in this class. It was always there, but now you're seeing it. Journalists are reporting about polls, about how popular the president is, and advertisers are trying to get you to buy products and they're giving you information about which toothpaste is better. There's a lot of discussion about healthcare. how much disease there is in the country. And there's always this possibility of a margin of error. So I think I was using pink for margin of error. So we've got this margin of error. So you're like, these are the results, give or take. So it's like a little bit of a disclaimer. But the size of the margin of error can vary greatly from study to study, depending on how depending on a lot of factors. So the best way to understand those factors is to look at the formula. So I know I said that we wouldn't be doing it, but let's look at the formula. So I say margin of error and it equals Z star, which is our cut point associated with our confidence level times our standard error, which is, or standard deviation, depending on whether you know what P is or you don't know. I'm going to do standard error because if we really know the P, we don't need to do all this study. So that is our margin of error. your book just says E for error. So we know that's margin of error for your confidence interval. And if you drop the hats, it's your standard deviation for your sampling distribution. And if you look at this, the bigger your sample size, so sample size is the N. The bigger your sample size, the bigger the denominator of a fraction, the smaller the whole thing is. So the way you can adjust your margin of error is the easiest way for a researcher to adjust the margin of error is to adjust the sample size. So if you make that sample size bigger, your margin of error will shrink. But one thing... Here, I want to draw attention here. Alternately, we can be conservative and say 0.5 is our proportion. So we're estimating these proportions. We're estimating these, right? If we haven't yet done a poll, we might have no idea what P-hat is. So if you want to be responsible, about margin of error. What p-hat should you just use as a default? That's what we're wondering. Well, and we're going to be, there's all kinds of factors, but let me just ask you to pause the camera and go ahead and answer in your words, in your concerns, what kind of factors do you think might influence a researcher's decision about what margin of error is acceptable? So when you're just trying to decide the easiest way... To shrink a margin of error is to increase the sample size. The easiest way to, if you want to cut corners, is to, so sample size is the only thing you can easily mess with. So pause that and answer this question. Okay, so what'd you guys come up with? What kinds of factors do you think might influence a research decision? about margin of error, what margin of error is acceptable. So how would you change margin of error? Well, what I want to do first is I want to rearrange this. And if I want to get N all by itself, I'm going to do a whole lot of algebra. It's kind of like rearranging the furniture. And if you want me to go through the algebraic steps, I will. Basically, what you would do is you would divide both sides by Z star. Then you'd square both sides. Then you might multiply both sides by N. So you're using algebra, the golden rule of algebra. Whatever you do to one side of the equation, as long as you do the same thing to the other equation, except for dividing by zero, you still have an equation. So I can rearrange this. And if I rearrange this, I get N. all by itself equals our p hats, p hat, 1 minus p hat. So that kind of comes from the data. And then we have our z star. That is set by whatever confidence level we want, all over e. And this thing right here has to be squared. So there it is. So the really the easiest way, if you want this to be as small as possible, then you have to increase that you have to, it's kind of easier to see over here. If you want this to be as small as possible, you probably have to ratchet up the end. So if you don't care how big your margin of error is, then you can have a small sample size. So when we're talking about this right here, what I want you to think about is if you have p hat times 1 minus p hat, if you're going for sample size, what, and you don't have any idea what your proportion sample is, well, let's look at some patterns over here. If we look at point, if p-hat is 25%, then what's 1 minus p-hat? That's going to be 75%. These two always add up to 100%. And when you multiply those together, you're going to get 0.187. Now, what if p-hat were 0.5? one minus p hat is also 0.5. And we'd end up when we multiply that together, we would get 0.25, which is a lot bigger than the first one. And we could mess with it, we could say point, what if 65% is our proportion, then one minus 65% would be 35%. And when you multiply that together, you get point. 2, 2, 7, 5. So the reality is this combo right here yields the biggest fraction. This right here yields the biggest. This one's the biggest. So if you have no idea what p-hat you should be using, and you want to be responsible, so you don't want to be cutting corners, then that's why they recommend picking P, if you don't know P hat, and they say P with an Enya. I don't know why, but that's equals 0.5 is the conservative P hat default. if you as the researcher have no idea what you should be doing. in terms of someone says, well, what should the sample size be? I don't know. Then you're going to plug in 0.5 and it'll go in both places. Of course, it doesn't let you write that. Interesting. Okay. So that's the conservative choice. Just throw in p hat of 0.5. And you'll be getting a bigger sample size, but you're being responsible. So we're back to what drives what sample size you want to, sorry, what drives what margin of error the researcher, I think all researchers would want tiny margin of error, tiny margin of error. of error means almost certainly because you can't really um you can't mess with the confidence level it's usually given to you you're going to put in 0.5 and 0.5 here the only thing you can mess with is the sample size so a tiny margin of error means a large sample size And that which is expensive. So if your margin of error is what flavor ice cream is going to sell the most on a given day on a beach? Well, if you're off a little bit, no one's going to die. But if your question is, if you're, if you're building. surgical instruments and the tiniest bit of difference in size could lead to the surgeon making a mistake and people dying, then you're going to want to have a really small margin of error. So yes, sample size means more expensive, but if your product ends up killing people, you lose money there. So if even the greediest people, if the results, if error results in people dying, people's death, you need a small margin of error. which almost always means a large sample size. But if you can do a risky adventure, then if error isn't so bad, doesn't cause severe... consequences, then cut corners and save money because the smaller your sample size, the smaller your sample size, the more money you save. And so I just rearranged this right here. My name's Bronwyn. Bronwyn's my name. It's the same information rearranged, but if you rearrange it in the formula to the right, then it kind of stands out that as your error gets smaller. your sample size has to get bigger. So it kind of focuses on what is that sample size? And what the beautiful thing is, is that the Dana Center did that for you. They have this, they have this formula loaded in a part of their tools so that you can put in a desired margin of error and presto bingo, the desired samples. So you stuff in. the margin of error that you want, and what gets spit out is the desired sample size. No algebra needed. So that's what we're going to do. Set by researchers. So you're making some kind of a surgical unit. We have to have a margin of error at most. two millimeters. Plug that in there. Be responsible and assume that the p-hats are the biggest p-hat there is, which is 0.5 for generating error. And then you will get out your sample size. Okay. So there's not much to cover in this section. While keeping confidence levels the same, so that's our z-star. we're not going to change the star, our margin of error will decrease as our sample size increases. And you can kind of see that from the formula right here. Increase this, this shrinks. Decrease this, this grows. Okay. you'll be able to determine the sample size given the margin of error when working with proportions and we're using technology, which makes us happy. OK, so here's an example. A manager of a bookstore. at a large university is planning for a new semester and must decide how many books to stock in the store. The manager has noticed that an increasing proportion of students are buying books online instead of at the bookstore. This is a real problem. This is a problem for our bookstore too. So the manager decides to do a survey. So he's going to take out a survey. of a random sample of students. So he's doing that's good job manager. He must've taken stats before being a manager to to estimate how many students are going to buy books. So he wants to know how many books he should order. The manager needs to determine how many, how, how many students to survey. So he's, so there's two sizes of students. He wants to know how many books should I order? How many books, how many students are really going to buy? from the bookstore. So his population of interest is all students and the proportion he's interested in is how many are going to buy books. But his question is, how many students should he survey? The manager has no idea how many students actually buy their books online. What value should the manager use for p? So he doesn't have any idea what p should be. What should he plug in as a default P for the sample calculation? Explain why that value will give the manager the best sample size. So why don't you pause that? Well, what should, if we don't know P, what should we put in? And what we should put in this. is the default, best default, if you know nothing and explain why. And I did just kind of talk about it. So look over your notes and say, why, why should that be? This key hat. p hat or p it's like a swirly hat it's a default hat yields the biggest sample size it's like your worst case scenario worst case scenario. so you're just, I don't know what P should be. Here's my formula. Here's my, oops, here's my formula. I don't know what P should be, but if I plug in a 0.5, I'm being the most responsible because it's my worst case scenario. It's going to generate the biggest error. So Worst case scenario for generating biggest error. Okay. The manager decides to use a confidence level of 0.95. So he might not actually, it's usually not, it's usually the researcher that like maybe the chancellor's office or something like that will say. We don't want to have, we want you to be pretty confident in your results. And the powers that be say that the margin of error should be 8%. So that is our E. E equals 0.08. Okay. Why might the manager avoid a smaller margin of error? Why is it being set at 8%? He wants to... He wants to know what proportion of students are going to buy at the bookstore, give or take 8%. Why 8%? So he thinks 8% is like a sweet spot. What implications? So this is the question. What implications? What are the implications of margin of error? Why should, why might? Why might the manager avoid a small margin of error? So he wants, he, he saying, oh, make it big. Why might, that wording is a little confusing. So why might he avoid a small margin of error? He might want a large, if, if he or she, then I'm going to say he, can tolerate. a larger margin of error, then he can have a smaller sample size, saves. time because getting that random sample of students, if he only has to get, you know, 20% of our small group of students, then it's not, my Abby is accurate, but he doesn't have to spend as much time. So it saves time and he probably makes a pretty nice amount of money per hour. So it also saves money and money in terms of research, how much you pay researchers. But the other question is, why might the manager avoid? Why might the manager avoid? a larger margin of error. So if he can, if he goes for, if he goes for a larger sample size, his results are more accurate, that he gets a better information on exactly how many books he should buy. on how many exact books he should buy, which actually will save him money in the long run because storage might be super expensive. And if you don't sell the books, you have to ship them back to the publishers or you have to store them. So he's saving money. That all costs money too. I'm saving money in storage for shipping back to publishers. So. or having to do a desperate rush order. He got the wrong information. He bought too few books. And that means that either the students will go without books for the first few weeks of school, setting them up for failure, if it's math class, quite frankly. So there's lots of consequences if you don't get precise information. You get too many books, costs you money because it sits on the shelf. And it's a waste of space and it's a waste of energy and sending it back because you don't want to hold on to those textbooks because maybe the next semester they'll use a new edition. Or it could be there's not enough books for the students and the students fail. So from my point of view, as a teacher, a math teacher, the consequences of not having books at the beginning of the semester can be devastating. So I would want him to minimize the error. and to actually bite the bullet and increase the sample size. And I would say, yeah, I'll help you with that. It's really important that our students don't go without books. So, but it's a business decision and you always have to weigh the pros and cons. Okay, so we're now going to go, we're going to use our beautiful tool here. We're going to go to the proportion part because we're talking about proportions still. And we're going to see this wonderful tab for sample size. So. go over here and you can just cut and paste it in, but maybe it's not going to work. So I'm going to bookmark it and I'm going to go to inference. That just happened. Come back to me. I'm going to go to inference right here, open that up and click on that. And I'm going to notice something that I haven't pointed out before, which is right here. So you've got a tab and I'm going to click the sample size tab. And let's see. Go to that. So demo list. Right. This is problem number two. answering all the questions. Okay. So, well, if you didn't do this, you could do this. Watch, just watch. We know that error equals z star square root p hat one minus p hat over m. Okay. We know that we should replace the p hats with 0.5 because that's being the most conservative. That's the most error. So it'll be 0.5 times 0.5 square rooted all over n times z star. Well, our z star is going to be 1.96 because it's 95% confidence level. So that's the z star that goes with that. So things are looking kind of nice. Maybe we don't have to use the Dana Center tool. We can just plug that in and figure that out. And the thing that's, it's all numbers. Oh, and we want our margin of error to be 0.8. So I'll pop in a 0.08, not 80% equals. Okay. So the only, we need to find the magical end. So 0.5 times 0.5 is 0.25. So things are chugging along really nicely. Who needs technology, right? Well, you need to solve for that end. So what you would do is you would divide both sides by 1.96. Cancel, cancel. And then you would square both sides. So 0.08 over 1.96 squared equals 1 over n, because I squared both sides. Oh my gosh, now we have to flip everything. Oh, forget it. I told you that you didn't have to write that down. We're not going to do that. We're going to go to the Dana Center instead. And we're going to say, I'm so glad I'm not my parents. Go to Dana Center Math Pathways Tools. And by the way, if any of you become rich and wealthy, donate to the Dana Center because this is all for free. It's non-profit. So you go here. you click there, you pick the sample size tab, and you just throw it all in there, and you don't do any of that algebra. So you go here. So this homework is going to be so fast and quick. So the desired level, we know that we're given... that we want to do a 95% confidence level. So we've set this right here to 95, which is the default. Okay. And you can probably, and if you notice I have as a default, because we are assuming the worst case scenario, we're assuming p hat, we're going to plug in 0.5. All you need to do is click. that little arrow right here. And if you know, if you say, I know that 20% of the people are not going to buy, then you can put that information in. But if you don't know, we just click that approach for 0.5. Worst case scenario. What else do we need? We should set the margin of error. So we know that we want our margin of error to be 0.08. That was given to us by maybe the chancellor's office. So that's the sweet spot for enough students will get books. You won't have to pay storage. They made some kind of decision. So we're going to now slide that goes right here. So we're going to slide that to 8%. 8%. There we go. And presto bingo, we do all that work and we get our sample size without any pain and suffering. For all of this, we need a n equal to 151. So don't need to do the algebra. You just need to realize that in the background, the Dana Center had all this information. They rearranged it and they plugged it in and did all that work for you. So let's do it again. I think it'll be a lot nicer. So now play around with this and see how it goes. So for problem number three, so if you can pause and I'd like you to read. problem number three, and I'd like you to pick out the critical pieces of information that you need to plug in. You're going to need to know, it always is helpful to know, you're going to need to identify the set margin of error that you want. You're going to need to identify what your confidence level is. So I'll call that C level. And it's usually the 95 or 90 or 99, usually. Sometimes there's something else. So you need to identify that. you need to identify what your P should be. And if you're not sure, then the default is 0.5. So I'll just put that in parentheses. If you don't know, assume 50% because that's going to generate the most error. So go ahead, throw that in and then presto bingo, it's going to calculate down here. It'll calculate the sample size for you. Okay. So pause and do that. Okay, so the ecologist is studying the impact of honeybees and plans and ecology survey to determine the proportion of bees have pesticides present in their bodies. Now, if you want to know about the negative consequences, I think Einstein said something about when the insects die, we have like months to live. And bees are the most important because they do pollination for us. Okay. The ecologist decides that the confidence level should be 0.95. Okay. So we got this one. And that the margin of error is 0.5. It's 0.05. So always do it as a decimal. So we got that 0.05. And oh, I think we got everything. So, but they do not have an estimate for the sample proportion. Oh, if they don't have an estimate for the sample proportion, I think honeybees are super important. So we're just going to keep this 0.5 as a default. because they don't have an estimate, use a web tool. So we're now going to plug all that in and we're going to get N equals block and we're going to get it. It's going to be from down there. So our margin of our confidence level is 95. I don't need to change that. It's already set. So I'm going to keep that right where it is. And the margin of error is 5%. Oh, I guess we can do it as, okay. So got that. And we're using the conservative approach. So we keep this, we keep this checked right here. And presto bingo, our sample size has to be 385. So if you want to do the algebra. to verify it, you can, but if you don't, you can use this tool. So let's do number four for number four, pause it and do it again for four. So identify the given information and plug it in and meet me back here. Okay. So welcome back. Um, a biotech company is developing a new rapid test for influenza. Really important. People will die if the results are wrong. After completing their own testing, they claim that the test is correct 97% of the time with a margin of error of 1% and a confidence level of 95%. Okay. An independent researcher now conducted another study to verify the results. Use the web tool to calculate the necessary sample size for this study. So what's different here? So we've got our confidence level, our C level. is still 95 which is the most common so 95 okay so this doesn't have to change that stays the same our margin of error our margin of error margin of error of one percent so i'll move that to one Okay. Um, so this is the part that's different. They're saying, they claim that the test is correct 97% of the time. So, so their claim is. biotech is developing a new rabbit they want to know so they're saying that the proportion of times it's correct is equal to 0.95 or 90 um 0.97 or 97 percent so because of that we're not going to use 0.5 we're going to use 97 because they have prior knowledge. Okay. And that's going to mess with the sample size. And so that's the only thing that's different is you want to go, we're not doing worst case scenario, we have better information. And so now, if we, for our sample size, and equals, we're going to you look down here and it is 1118 people. So if we want to honor the prior knowledge that the success rate is 97%, we've got to plug it in. We won't use the default of 0.5. So that's what's different about that one. So I'm going to clear all of this off and we're... really close to being done. So last one, last practice. So pause it and do it yourself. An auto parts company is conducting a routine quality control testing airbags. Ooh, that's important. If they screw up and some companies have done that, people will die. And then they're going to lose money. Even if they're not sued, people aren't going to want to use their products. Past testing shows about a 2% of airbags were defective. So that tells me that my P, proportion that are, because we're studying how defective are the airbags. And if we already know from past practices that the proportion is 2% or 2%, pick however you want to do it. That tells us that that result, I'm going to use this right here. So I'm going to slide that down to 2%. Okay. It's hard to get there. There we go. So the company has decided to give it, they want a 99% accuracy. They want a 99% confidence interval. So I'm going to change this right here to 99. Okay. And they want the margin of error to be 1%. Okay. So that information, desired margin of error. Oh, it's already at 1%. Was the other one at 1% too? Okay. margin of error is at 1%. And so we plugged everything in and we get to get that desired margin of error and has better be equal to 1,301. So that is the magic. And remember, it might be rounding up because you always want to round up if you're doing if you're doing sample size Okay, it rounds it up for you, though. You don't have to. So, so now the rest of this worksheet has to do with working in groups. And what I really care about here is that you know how to use the tool. So I am going to say that the rest of this worksheet is we're done. And so you're going to have a shorter homework assignment too. Okay, so we're going to stop right there. Okay, so the tutors hopefully will know when they're checking your notes that you can stop right there. All right, so we're done with this section. Happy. Take a break. Play with this tool. And so the summary is that we're using this tool right here. I want you to have the idea that If you want to bring your error down, which is usually what the researchers discover, oh, that error is too big. The way you do that is you bring your sample size up. You increase your sample size. So, and there probably will be some questions about that on some multiple choice type questions on the next midterm. And really with this formula here, now it's a big mess. Usually you can't, you don't have control over what the proportion of successes are. And usually you don't have control over what the confidence level is. And usually you're given the margin of error. And then that determines what your sample size should be. But I want you to have an intuitive understanding. So this right here is usually given. The Z star is connected to the confidence level. So you just need to have an intuitive understanding that if you want to bring the error down, you have the only thing you as a researcher have control over is the sample size. The only reason we don't have gigantic sample sizes is because it's expensive. And then I want you to be able to use this tool more than algebra to find out what those sample sizes are. So you need to identify your proportion of successes. from either prior knowledge, prior knowledge, or set P as a stand-in to 0.5, because that's the worst case scenario. It generates the most error. And then you just need to pluck out what's the margin of error they gave me. And you look for it. What's the margin of error? What's the confidence level? You look for it. It's embedded in the word problems and then plug it in. And from that, N pops out. Yay. All right. Happy homework. Take a break and knock that one out. It'll be pretty quick.