Obesity Study and Sampling Distribution Insights

Hey there, welcome back. We're on section 9c in class 9c and today we're going to be talking about obesity, one of my favorite subjects, and we're going to be using it to do a stronger connection, even stronger connection, between the sampling distribution for proportions and the the normal approximation, which we're going to grow to love because it's so much easier to deal with. So here goes. Example number one, well, warm-up question. In 2017-18, the National Health and Nutrition Examination Survey estimated, oh, something shocking, which was that, oh, let me. 42.4% of American adults fit the medical definition of obese, which is overweight in a certain category. When you hit a certain level, you go from being just pleasantly plump and overweight to obese. And it's almost 42.4, that's dangerously close to half of all Americans. Now, if we're in California, we might be more shocked by this. than other states because I do think it varies from state to state and California tends to be one of the thinner states. The Midwest tends to be more thicker but overall we're going to treat this as the number that describes everyone. So parameter. A large medical institute medical clinic, sorry, instituted a wellness and nutrition program. So they started something just for their clients, where the patients could opt in to receive text messages with nutrition and exercise tips, or use an app to monitor their dieting and activity levels. This clinic would like to determine if after a year, So the question is, after a year on the program, the proportion of its patients who are obese is less than the national average. So we're going to accept that this is an accurate estimate. I'm going to circle that and say P.E. equals this. That's a proportion, so not p-hat. We're going to assume that this is the true proportion of people who are national adults. So question number one, as a statistician on a project, explain how you could conduct a study to decide if there is evidence that the rate of obesity among the clinic's patients is less than 42.4%. So we want to, we want to do now the best thing would be an experiment, but it is unethical to experiment on people, especially if you don't get their permission. So I'm going to sculpt this a little bit more. You could do an experiment, but let's not do it on the people. Cause they might get mad because they're just there for wellness. They're just, there's it's a large medical clinic. So So think of a study, an observational study that could help you. Okay, pause and write it down. Okay, so my suggestion here is to look at the people in the clinic. So take a random sample. of people. Oh, no, I got to be careful because I don't want, there are children probably, of the adults in the clinic who opted into the program. So we're only interested in looking at the people who actually did the program. So take a random sample of them, wait a year, give them time. to get exposed um and then um measure so one two three measure the proportion of the random sample are obese. And for compare that, and it's actually going to be a p-hat, to the national average, I should say national proportion. to see if that p hat is different than 42.4%. Okay, so it's a simple enough study. Just wait a year! You know, so you probably only want to look at the people who are actually doing the program. You want to restrict it to adults because that's what that proportion has to do with. I think the obesity level for children is different. Now, the one thing that I would suggest that could improve is what if after a year you see that 78% of them became obese, then did your program work? No, it's significantly different maybe, but it didn't work. So let's change the word different here. is smaller. We're interested. We're not interested in finding data that's bigger than we know it didn't work. So didn't work is unchanged or bigger, but did work is going to be smaller. Okay. And I'm going to make sure that decimal is really clear there. So we're treating P as 42, the known proportion that describes the nation, that parameter. we're saying is 42.4%. We're going to be using in this class, we're going to be using the Dana Center tools to get a deeper understanding of connections between sample proportions p hat with how those work, how those compare to normal approximations. So I want you to make sure that you have access to the Dana Center tools. so that you can watch this video and you can go along with the activities. It's really important that you do these activities at the same time that I do, because you're going to get different results than I am, and that actually strengthens the arguments that I'm going to be making. So make sure that you have that. And whenever we put the information in the Dana Center tool, the p-hat and the P's should really be written as a decimal. So you move the decimal place over. So if I was using the Dana Center materials, I would make sure to put that in instead. Okay, so after this class, you're gonna understand that as the sample size increases, as the sample size increases, the standard deviation, and if we're talking about population standard deviation, sigma of the sampling distribution of sample proportions will decrease. So as your samples get bigger, your fluctuation of the data, how much the typical data points fluctuate from the center are going to go down. But I think hopefully you already have a pretty good intuitive understanding of that. And maybe you even know the name of the theorem for that or the law. And then for larger samples, the sampling distribution of the sample proportion can be approximated using the normal distribution. So our sample proportion, we are going to be using the only... variable that we're interested in today is p hat. And as n gets bigger, our distribution becomes more and more bell-shaped. So that's what we're going to be looking at today. So what you'll be able to do by the end of the class is determine the required sample size for a given standard deviation, so required sample size. That's the last thing we're going to do. you're going to be able to determine whether normal distribution is valid if you can actually use it for the sampling distribution of p-hats based on the sample size. So you're going to be determining the validity. Normal distribution is valid question mark because sometimes it is and sometimes it isn't. And you're going to be able to use the normal distribution to actually calculate percentiles involving sample proportions. And by that we mean calculating the area associated, which gives you the percentage or the probability associated with that result. So let's go. Alright, so looking at the example up here, this example. sorry, I'm getting ahead of myself. You're going to be exploring, you're going to be comparing the sample proportions of varying values. And so let's just hop to it. Let's, let's go right, let's jump right into it. And if this is not hot linked and this is also true in the homeworks, the preview and the practice assignments and just go to your handy dandy homepage which you should have bookmarked and we're interested in sample proportions. So that is going to have its own little section right here. Bam. And we're interested in the proportion one. So I'm going to select that one and go ahead and get it going. All right. So it looks a little, it might look a little better on yours if you're dealing with a laptop, because unfortunately I can't see the graphs until I'm going to be doing a lot of this. So hopefully that's not going to be the case. I hope you have a beautiful laptop and you're not doing this all on an iPad, but I have to do it on an iPad to annotate. All right. You'll be using this tool to simulate samples of different sizes of the American adult population. So it's going to be a simulation where the sample proportion of who are obese is calculated for each sample. OK, so. That's where we're headed. So you would like to simulate a random sample of American adults to measure the proportion who are obese in each sample. What value should you set for your population proportion? So we're going to assume that right from the beginning, the people who attend this clinic are very similar to the people who, so this is before they get exposed to the new program that helps them think about their, their health. So they're not all obese. If they are like the American public, what proportion of them are going to be obese? They're all, we're just looking at the adults. So it's going to be 42, 40, 42. 42.4%. That's what we said, right? So you're going to come over here and you can't do something that's precise using the sliders. So I'm going to click here to be able to type it in. And I'm going to just type in 42.4. And it should have a little nervous breakdown. Why did it not? I guess on the iPad, it doesn't tell you, but. Okay, so that was disappointing. Didn't have a little nervous breakdown, but it should have because it falls apart. You really need to remember. that you put it as a decimal. Okay. So I was expecting a little error message. So 0.424. Okay, good. So we got that. If you were to draw a random sample of Americans from samples of American adults, So I'm on this one right now. The proportion of each sample who are obese, how would you predict the shape, center, and variability of the sampling distribution of the sample proportions to change as the size increased? So we're asking about a little bit more room. Maybe I'll slide this over. I'm not going to look at the sampling distribution right away. I'm going to, before we use technology, how would you predict the shape would be? So what do you think the shape would be? Shape, center, and variability. And I think the best variability I'm going to use is going to be, let's just go for standard deviation, as how would those change as the sample size increases? So as... As n increases, what happens to the shape? What happens to the center? And what happens to the standard deviation? So what I'd love is if you could write down what you think will happen. So these are all predictions. So there's no wrong answer. So write down what you think will happen. So some people will say the shape doesn't change. It's always normal. Some people will say that the center doesn't change. It's always right smack dab at the center. And some people might say the standard deviation, I don't know, it changes. Maybe it gets bigger, maybe it gets smaller, maybe it stays the same. It's all, we're all just guessing here. Now, I can't guess because I actually know the answer. But I'd like you to think about what you think it's gonna be. And then we're gonna revisit this. Okay. So I'm going to do A and then I think, I think we'll do these all together actually. Okay. So set the population P in the tool to what you had in the question. So I just did that. I did that already. So we've got this in here. Okay. You will need to check. So we have to do, you had to check that little box to get three places past the decimal. And you always want to make it a decimal. Okay. So now for part A, we're going to set the sample size to one. So we're only going to select one person from the clinic. And we're going to measure whether they're obese. And that will be our P hat. So the possible P hats, if you're only looking at one person, is they're either obese or they're not obese, right? That one person. So your P hat is either going to be zero or one, which is a little ridiculous and a little confusing. And we're going to look at one person, put that person back in the roll sheet, and then we'll do it again and we'll do it again. And we'll do it a thousand times. So your N, is one, because you're looking at one person, and we're going to create a sampling distribution. It's kind of a ridiculous situation, because when do you ever look at just one person in a sample? But okay, let's just do it. So we're going to change n, n here is going to go there, and we're going to change that to one. Never likes my ones, it always thinks of them as L's. Okay, there we go. So we got the one in there and, um, So I'm not, I'm going to zero means that you are not obese. One, the success is that you are obese. I'm not going to change the label. I'm just going to go ahead and to get comfy with this. I'm going to, I'm just going to do a couple just to remind myself of what this actually looks like. So I'm going to. Look at one and I'm going to draw my sample and there it is. So I selected somebody and they happen not to be obese. Cool. I'm going to do it again. So I'm going to keep that person. I've got a frequency. If you see the frequency down here, the frequency is one. One person looked at one time. I'm now going to do it another time. So draw a sample. And oh, so. this picture right here tells me only my last proportion. So my last proportion, I did get an obese person. And so if you look down here, now, you'll see a little tiny bar graph for one for 0.0 and a little tiny bar graph for 1.0. And that little triangle represents the most recent proportion. So it's a little confusing. because we're only looking at one person at a time. Now they want you to look at a thousand people. So your N is going to be a thousand. That's a good color for N. This will do turquoise. So let's go back up here and hit reset. And so it disappears. We've got no data to look at. So we're looking at one person a thousand times, which is a little confusing. And Presto bingo, what we see, the top one is what the population looks like. We know that the population is 42% are obese and that's why it's got a probability associated with 0.42. You can't really see it, but it's there. Look down here, you get your last result. So this is the most useless. This down here is our actual sampling distribution and what we got, if you look at the title here, you can see that the average is 0.442, which is pretty darn close to 0.424. And the standard deviation is 0.497. And we learned last time that's really standard error. It's our approximate standard deviation. So what I would like you to do is I would like you to draw this sketch over here. Okay, so you've got a sketch. And... pay attention to how the only possible P hat, so this is P hats, N equals one, okay? And the only possibilities are that the person is not obese, the one person you look at, or the person is obese. So those are our possibilities. And when we look at the bars, the kind of royal blue bars, we see that out of the thousand that we looked at, about 500 and maybe 50 are not obese. And oh, well, it's 442 are obese. So I won't put the scale there because the scale will change. But I want you to realize that that's, that's not, those are not bars. They're really, we've got 500 and well, we've got 442 people who are obese and 500 and whatever the leftover is from a thousand. So that's what that is. And that's our first probability distribution. So the next time, and they ask you to say, be sure to label your axes and provide a descriptive title of your sketch. So it's a little tight to put it there. I'll put the title over here. Title sample proportions are let's let's abbreviate that p hats for n equal one one thousand samples. So 1,000 little dots, 1,000 little... PCATs. Okay. So couldn't squeeze that title in there. All right. So now we're moving on. And instead of having N equals one, we're going to now have N equals five, which is a way better study. Instead of randomly selecting one person from the clinic, we're going to select five people from the clinic and we're going to see what proportion of them are obese. So Before we do that, I just want to, I'm going to draw that same axis here. Out of the five people, is it possible that we could get zero people who are obese? Yes. Is it possible that we could get all five out of five obese? Yes. So that's one for a hundred percent. And so we're talking about P hat and equals. five this time. So that means that we could get one out of five, two out of five, three out of five, four out of five, or five out of five. So one out of five is 20%. Two out of five is 40%. Three out of five. So this is 0.2 and this is 0.4. I'm just tracking all the possibilities. because whenever you want to do a probability distribution, sampling distribution, any kind of distribution, you want to have an idea of what's possible. So three out of five would be 60%, four out of five would be 80%, and five out of five would be that 100%. So we've really got little tick marks here. I'll choose a different color. So this is possible. This is possible. This is possible. This is possible. So that's possible. So I haven't looked at my distribution yet, but I'm not going to have two lines. I'm now going to have one, two, three, four, five, six possible lines. So let's go and change it. Let's hit reset. So we've got a clean slate. We need to change the sample size. uh, to be n equals, so the, um, the proportion doesn't change, but we're going to have this be n equals five. So I'm going to go ahead and change that to a five, and it was able to read it. And what I'm going to do just for fun, I'm going to, just to make sure I understand what's going on. And, um, I have to say that of all the Dana Center stuff, this one is not the most intuitive. That crescent moon down there, the orange crescent moon represents that if we measured everybody in the clinic, assuming that they were just like the general population. And I'm going to select one sample. So that's not our sample size. It's like one p-hat. So I'm actually selecting five people and I'm going to draw my sample. And what I got, that little triangle down there says. that out of five people, I got 20% obese. Yay! 20% obese means that I basically got four people, no, one person, one success, four failures. So that's all written in the title right here, if you look at that title. And if you look down here, we're starting to create our sampling distribution. And we've got that. One little blip because we only looked at one group of five. Now I'm going to do it again. And there's my second little blip. So now I got 60%, which was three out of five. And it's saying, if you look in the title here, it says two simulations. It doesn't break it down for you, but up here it does. Three successes, two failures. So that's how it's working. I'll do it one more time. Bam. And now, oh, looks like I got another, another three fat people or obese. I don't know if fat's a bad word. Sorry. So let's now, if we, if we kept doing that, you could see it growing here, but unfortunately, I don't think I can shrink this to, it's just off. the screen. So you can see it growing. I can't. I'm going to reset and I'm going to do instead, it asks for a thousand. So I'll just hit thousand. And that means you just looked at a thousand groups. Each group is five and you're counting up how many are obese and look, there it is. So just as we expected, our P hats, our P hats fluctuate between zero and a hundred percent, not, and what I'm going to draw, what I get. Now what you get might be very different. So it's just an estimation. I can see this is bigger. This is the biggest. And do you think that's surprising that, so you got maybe, oh, it looks like 300. If you look at the frequency down here, about 300 people out of your 1000 people, 300 people. were 320 looks like maybe around were actually obese. And then we keep going, the 60% drops a bit, the 80%, a little bit more, and there's a tiny bit that was 100% obese. So that's what our distribution looks like. So the title would be P hat, P hat sampling distribution for n equals 5, still a thousand samples. Okay, so now, so, oh, well, that distribution looks a lot better than the one up above. The one up above was ridiculously useless, really. So now we're still going to have a thousand samples, but we're going to up our sample size instead of looking at five people at a time, we're going to look at 25 people at a time. So I'm going to go ahead and draw my axes here. And I'm going to think about it. I think still zero and one, and I want to have the same scale. But if I have, if I'm looking at 25 people, it means I could have one zero out of 25. I could have one out of 25, two out of 25. I, there needs to be 25 little tick marks along the, this. So that's a, that's a lot. So there's a whole bunch of possible answers for what p-hat could be. So instead of having just those few 1, 2, 3, 4, 5, 6 possibilities from the previous one, there's now going to be 26 possibilities because 0, 1 out of 25, 2 out of 25. So if you want, you can try to do 26 little tick marks here. I wasn't able to do it 19, 20, 21. So I'm anticipating that this one is going to have a lot more. So that's the best I could do. I'm just going to draw a sketch of it, but I know it's going to be a lot more filled in. So I'm going to put a 25 here and I'm going to reset. And just for fun, I'm going to do one. And I got for my first draw, I had eight people who were obese and 17 who were not obese. So that was my first one. I'll do it again. Just you don't have to, but I'd like to see them kind of grow. Oh, I hit reset. Whoops. So that's my first one. Now I got so every time it's different. This time I have nine. people who were overweight and 16 who weren't. I'll do it again. Oh, got the exact same result. That doesn't happen very often. Let's see. We'll do it one more time. Now I got a new one. And if I look up here, I get the result 14 out of 25. So that's, that's a hefty one that so if I only looked at that he had, I would think that the clinic had more. overweight people than normal. So anyway, I'm now going to do a thousand. I'll hit reset so I only have a thousand to deal with. Draw my sample and did I draw my sample? And there it is. Oh, just like I expected. There's quite a few little tick marks there for my possibility of 25. And notice I have no zeros and no ones. My spread goes about, and so I'm going to use the scale up above. um, my spread, I can see I've got most, I've got a fair amount at 20%. And then I have almost nothing, just a little bit below 20%. The peak seems to be right at 40. So I'll give it a peak there. And then there's actually a little bit more afterwards. And then it goes down from there. And my lowest point is 70. It's a little bit left of 70. So what I'm noticing, so just to get a sense of what it looks like, it doesn't have to be perfect. And you're, it's totally okay. If your results are a little different than mine, I just want you to see that it's not as spread out. The spread is, is. It looks like I'm going to make a mark here. I'm going to guess that maybe this is, it doesn't really say, but I'm going to say 0.19 to 0.68 maybe. So it doesn't really, it's much more compact, which makes sense because if you're looking at 25, the chances of you pluck one person, chances of them being obese, that could happen. But if you pluck 25 people, the chances of all of them being obese is practically zero. So this is what we got for N equals P hat. n equals 25 and the title is the same thing pretty much title um sampling distribution of 1 000 p hats where each time we consider 25 people. So long way of saying that n is 25. So I hope you can see what's going on here. What do you think the next distribution is going to look like? So now we can see the pattern here. So if we have 100. So we're still, so keep the scale the same. Okay. So we got zero to one and in the center about is, so 0.4 is about right here. So we'll just extend it down. 0.4 is about right there. Cause that seems to be an important characteristic. And now if we're looking at a hundred people, we can get every percentage, every 10%, 20%, 30%. It's going to, there's a lot more possibilities. So let's go ahead and try that one. So we'll, we'll now we have 100. So we're going to put a hundred in here and I'm just going to make sure I've got reset. And then I'm going to draw a thousand of them. So a thousand little P hats, each P hat is, is interviewing. Oh, and look at that. It's a lot more. So now it's still centered around 40. 40 seems to be showing up right there. And, but it doesn't go, seems like the first little blip is around 30. So 20, so 30 is about right here. And then it has a little blip at 60. So 60, maybe it's about right there. And it's actually a little bit, it's not, it doesn't go to 60, does it? So it's going to be, it's not perfect. I mean, is it perfectly bell shaped? Uh, no. And yours probably isn't either. Oh, you know what? I wasn't paying attention. It kind of precipitously goes down. So I'm going to just. do one more like this to kind of your shape is going to be different than mine but it drops off like that and then on this side so we get the idea it goes down and then it goes up again and then it just goes like that okay and let's see i'm gonna zero in are there any outliers Maybe, I mean, that doesn't look like an outlier to me on the tiny end. And on the other end, it doesn't. But maybe one of you gets an outlier on yours. But I am going to write down that it says here the mean equals 0.423. Yours is probably slightly different. So I'm looking at the title right here. and it says that the standard deviation, and this is really standard error because it's coming from a simulation, but we get a standard deviation of 0.0513. So I kind of wish I had written the other ones down, but I didn't and it wasn't asked for. But I'll still say this is p hats, n equals 100. So we're looking at 100 people at a time. That's good enough. I'm not going to write a title for that one. But I definitely see that it seems to be getting, it's looking more continuous. It's less jiggity-jaggity. It's more filled in. Okay. Consider the graphs you drew in question four. So I'm going to move this up here so that. We can look at all those graphs while we're answering. So your graphs are going to be slightly different, but explain what happened to the center. So what happened to the center of our distribution? And I'm just going to get a red. Is the center more or less the same? Yeah, terribly written, center. They all more or less have the same center. Center. So explain what happened to the center of your sampling distribution as the sample size increased. Did it match your prediction in question three? So center. It doesn't change. They all have the same center. All have the same center. And I really hope you guys are doing along with this. You'll see this isn't a fluke with mine, but all of ours will have more or less the same center. And I know this one might not look like the center, but it's a weighted average. So it is the center. It is the average. So it might seem a little bit off, but there are less values on the one than there are on the zero. So the center truly would be right there. Okay. If it were a median, you're right, but I'm thinking of mean. Okay. Explain what happens to the variability. So since I'm kind of gravitating towards mean, I want to use the variability of standard deviation. And they give those to us. So for variability, you could use IQR, you could use range, or you could use standard deviation. We could do all of those actually, and it would still, the pattern would be the same. But visually, I want to do the variability. So variability, we look at this because it's kind of hard to, there we go. This one, if I want to capture 68% of all the data, I think I probably have to go out that far in both directions to capture that major hump. For the next one. here. And for the next one, it's, do you see that that variability is shrinking as you go along? And this variability to capture 68%, you got to go way out. So the very, as the sample size increases, if you want to capture that inner 58, 68%, for if it is normal, which doesn't make sense with the first one, but If you wanted to do the IQR, it would be the same thing. To capture 50% of the data, you've got to go out further and further for the top graphs. So I'm going to go back and say that the variability decreases as n gets bigger, increases. So if it's going down, you're getting more precise information. And that's really nice. Explain what happens to the shape. Okay. So for shape, what happens to the shape? So for this shape, I don't know. I don't know what to say. That's just, that doesn't have a name. It's, oh, it's bimodal, but. shape is more like skewed uniform. I guess I could say it's uniform. You could see a rectangle there. This one, it's jiggity jaggedy, but you're beginning to see a knight. I wouldn't say it's beautifully bell-shaped. This one, all of a sudden, it's looking more bell-shaped, but it is more jiggity jaggedy. And this one is definitely filled in as bell-shaped. So I'm going to say... explain the shape of the sampling distribution and I'm sure your graphs are the same. Shape becomes more bell normal and I'll say that the variable seems to all mush together. It's becoming more continuous even though it's not. As and gets bigger. Okay so what's happening? More bell-shaped, variability decreases, and the center doesn't change as your sample size gets bigger and bigger and bigger. So what does this mean? It means that your distribution is becoming more and more normal as your sample size increases. And that is, we've already seen that, the central limit theorem. We already said that as your sample size increases, whatever you're trying to predict gets closer and closer to the truth. Well, that is your center. And if you go back and look here, the center here was almost 0.424. It was 0.423. It was super close to the true center. But the central limit theorem tells you more than just that. It tells you about all three characteristics. So in class activity 9b, we learned that the expression for the mean and standard deviation, we learned some things about the formulas, but we're going to really firm it up. now combining all of this together so this is a huge summary um so this is actually a very important um section and uh it's it's summarized here in um words i want to draw a little picture over here and i'll do it it's the prettiest color i really like turquoise so p hat n big enough. As long as your sample size is big enough, then you know the distribution for p hat. The mean, the center, is the true proportion. So we'll put p right in the middle, and it's purple because it is the true proportion. true population proportion. So the center is the true proportion. The standard deviation So the standard deviation is given by this amazing formula. So this is sigma. It's not, it's the population standard deviation. That sigma is given by this formula, but I'm not going to mash that in there. And as long as the sample size is big enough. And you've seen this before. 10 failures and 10 successes. 10, 10. Check, check. Remember that? As long as your sample size, that's what big enough means. That you have to check that NP is greater than or equal to 10 and N1 minus P. This was the number of successes. And this. is the number of failures. As long as your disease isn't so common or so rare that you have less than 10 people who's in the pool, then you can be sure that the shape is this beautiful, normal curve. So we've got our standard deviation in both directions so that we've captured 68%. So this is what we can be sure our distribution will begin to look like. And that's what, that's what the set. So the central limit theorem tells us a lot more than just what the center is, what the spread is, what the shape is. It, it tells us all of that. So The only really new piece of information is this is the formula for standard deviation. Okay. Everything else you have kind of, we've been exposed to that. All right. So based on the central limit theorem, so we're going to apply this right now. Based on the central limit theorem, what is the approximate distribution of P hats, a sample proportion of Americans who are obese when the true proportion is 42.4. So we're saying what is the national, what does the distribution look like of all p hats? So that'd be a lot to research but we know it from here. So I'm gonna, this is the whole world, so I think I'll make it purple. Actually I think I need to move this over a little bit. Let's drag it over here a little bit. Uh-oh. See what I've done here. I'm going to cover that up. I've been hiding this so I have a little more room to work with. That's actually erased. I hope the people at the Dane Center aren't mad that I'm erasing there. Okay. So, gee, what's... So we're going to do p-hatts here. They come from the data. So I'm going to make it orange. There are observations, p hat, and we have n is 100. That's our sample size. n equals 100. What do we know is the center of this distribution? Center equals what? 0.424. That's our true p. So I'll put it right in the center. four, two, four. And by the way, that's the P that we were given of obese people. And this is P hat. I'll just say P hat is number, proportion of obese people in our sample, our national sample. And now let's calculate the spread. And since we're using, and that is an average so we're going to use the standard deviation and it's it's not an approximation it's the actual one so I'm going to go ahead and use my formula it's going to be p 1 minus p over m p And we know P is 0.4241 minus P over N. that's it, but don't leave it like that. So what you want to do, and I think we did this before, is you want to first figure out what 1 minus 0.424 is. Hit that on your calculator, keep it in. So that's the first thing you want to do. Then you're going to multiply it by 0.424 again, and hit enter, keep. all those string of decimals in your calculator and then divide by 100. Hit enter. You still got a string of decimals. And then you're going to, the last thing you're going to do is you're going to hit the square root button and hit enter. And then, only then are you going to round. And so that we're doing exactly. along the lines of the Dana Center, I would like you to round to four places past the decimal. So if you did that right, you should get, saving the rounding for the very, very end, you should get 0.0494, which is about almost 5%. So that tells us that we would expect to get proportions, our sample proportion, we would expect to get around 42.4%, give or take five. So I'm going to, I'm not going to be that precise. Well, why not? Give or take five. So 0.424. minus 0.0494. So over here, I should get one standard deviation smaller would be 0.3746. And the other way 0.424 plus 0.0494. four, seven, eight, four. Okay. So that's my first, that's my major hump. And I'm not going to do the other, the other two. I'm just, I don't have it in me to figure this one out. That's two standard deviations and that's three standard deviations. I just know it gently slides to being done. So I'm going to give it a peek. I'm going to, this is my hump. And then it switches and it starts smiling. That's 95% of all data. And then almost everyone, but it does keep going. There's always someone else who's fatter and thinner in both directions. So this should be a little smoother. I'll just smooth it out a little bit. So do the best you can to make it that beautiful symmetric bell-shaped curve. And, um... This right here is 0.0494, which is sigma. And it's not an estimation. This is the model that all p-hats will be approaching if, only if, we have a sample size of 100. It all changes if we have a different sample size, except the 7. Okay, so that's what it looks like. What is the approximate distribution? So I just drew it pretty awesomely. probably went a little over the top. So we spent the first part looking at sampling distribution, but the central limit theorem tells us that everything is approximately normal. So that is the fantastic news, as long as n is big enough. Are we sure n is big enough if we have 100? Did we know for sure that n was big enough? So this is 6. Based on the central limit theorem, I just made that assumption. Let's check. Is n really big enough? Is n big enough? for this distribution it better be because i just this, these assumption? Well, NP is going to be 100 times 0.424, which is 0.424. Sorry, it's not 42.4, which is clearly bigger, greater than 10. Yay. And then we also want to check is 1 minus p, bigger than 10, 100 times 1 minus 0.424. And now we'll get what that complement is. So 1 minus 0.424 is 57.576. times 100 is going to equal equals 0.576. No, equals 57.6. What's even today? 57.6, which is clearly bigger than 10. So check, check. So that's, since we have and it's big enough we can be sure that we have this beautiful shape right here. Idealized beautiful little shape curve. Okay so since we know that it's really normal, we can go ahead and instead of using the sampling distribution, which is not that much fun, we can go ahead and we can instead use the I skipped a step here. Guess not. We can use the normal distribution and things are going to go pretty fast then. So here it is. Go to this normal distribution. So say goodbye to the sampling distribution for each of the problems before using technology, do a little sketch. So we want a little sketch before. So what is the approximate probability, so we'll do a little sketch, that at most 35% of the individuals sampled were obese? So we got this picture, this picture up here, but I'm just going to, I'm not going to do all that time. I'm just going to go, okay, I know that 42.4 is in the center. And this 38 that I got, this P hat is going to be me. I don't know. Right about here, maybe 0.35. Maybe we'll get right about here. three, five. So they're asking, what is the approximate probability? So we want the probability. Write the area as a percent exactly as it appears on the graph. So I know that this right here is my cut point. This is the P hat. And I want to know at most 35%, at most 35%, does that mean that 35% is the lowest point or the highest point? If I am at most 35%, if at most 35% of the people that I select. are obese, does that mean 34, 33, or does it mean 37, 38? Which way do we go? Well, we go this way. It's at most means up to and include and including. So the approximate probability is going to be this shaded area. That's the percent of P hats that are at most, p hat equal to 0.35. Okay, so I've got my, so I'm going after that area. So it's going to be that area. And so now I'm just going to use my lovely Dana Center, but I'm going to say goodbye to sampling distribution. And I'm going to say hello to normal distribution. And all I need to do is I need to make sure to put in the proper. center of my distribution and the proper spread of my distribution. So the center, so I'm going to define probability and my center is 0.424. Didn't read that. Okay, 0.424. Yes. And the spread. is point, I think I'll just give up on that and then we'll hit the key. Okay. So this one is the spread is 0.0494. It's unfortunate that it's so similar, but those are two different numbers. So all that hard work. They drew the distribution that I was working on over here. They just drew it in a snap and it is right, but all I need is a sketch. So, oh, and the value I want is not 52.08%, but the value I want is 35%. So I'm doing decimals all along the edge. So I have to be consistent here. So I'm going to do. 0.35 and it's so you see it read it as 35 it's trying to accommodate me but I'll just come over here 0.35 hopefully oh no that didn't work either 0.35 okay so now does the sketch look like so you kind of want to do your own sketch first so that you can catch yourself if you made a mistake so I It's all looking good so far. And it calculated the area for me. And it got that you would expect to see that 60, 6. So how do you want to do this? It says here, write the area as the percent exactly as it appears on the graph. And I want to write a sentence. You don't have to, but. The chances of getting a p-hat less than or equal to 0.35 is 6%. point seven one percent that's the answer and it's this shaded region right here um and i translated at most into that so the answer you could just put uh six point seven one percent if you want all right so i'm going to pause now and i want you to do um the same thing for the second and the third question you So for B and for C, go ahead and work those out, see how you do. And then I'll work them out and we'll see if you got it. All right. And again, please draw a little sketch first for each one of them. A little sketch, then go to technology so that you can catch yourself if there's some weird switching that goes on. Okay. I hope you paused and did that. a minus quick sketch. I know my distribution has a center like that. I know this is 0.424. And what's the approximate that p-hat is bigger than 0.45? So 0.45 is my cut point, so I don't know where it is. I'll just say it's right there, 0.45. And we're interested in the p-hats that are bigger. So it's going to be all of these. So the area is going to equal the probability. So I'm interested in that is the percent of p hats. bigger than p hat equal to 0.45. And notice, I don't have to worry about equality because it's a continuous variable, and that exactly equal is 0%. So I just need to figure that out. I've got my sketch. I'm going to come over here. All I need to do, I've got all this stuff in there the way I want. I want it to be an upper tail. So I'm going to switch that to an upper tail, maybe. It's not doing it for me, which is interesting. Upper tail. I had this problem last time. Okay, there. Now it's an upper tail, and I need to put the cut point of 45.45. Let's hope it read it. And that looks pretty good. My 45 was a little closer, a little further away from the center, but I'm convinced that that's okay. And it's going to be 70.7, 0.07% of peat hats are bigger. So, but you could just. that all you need is the um the uh decimal so this is a problem where are we i'm not referring to my notes and i'm getting lost okay okay Oh, wait a minute. I'm glad I looked at my notes. I didn't, did I, am I reading it right? Am I, look at that. That area sure looks like less than half. And I just went ahead and read the first number that came to me. I got to fix that. For the upper tail, it's really. change that. I'm going to put a different color in 29.93%. So it's all of these, my bad. Oh, I'm glad I caught it. And I should have been using my sketch to guide my way, but it's less than a third. So use that sketch to help you. All right. So the next one, what's the probability that it's between? So you know that you've got that because it's a normal distribution. because n is big enough in between you know 0.424 is at the center we want a little bit less than 40. I have no idea if that's a little bit less than 40 but it's just a sketch 0.40 and then 0.50 looks like it's over here 0.50 and we're interested in the area in the middle we're interested in this. And so that's going to be a middle tail. And A, the smallest value is 0.4, not 4, 0.4. Okay, I have to resort to that, 0.4. And B is 0.5. And sure enough, that sketch lines up quite nicely. I don't know how to get rid of that, though. Here we go. And so it looks like the proportion. of p hats between 40 and 50 are and you know what i'm gonna um i'm gonna keep my p hats as decimals that's how i help my that's otherwise i get a little proportion i get a little confused about what's what So I'm going to keep decimals on the axis. So 0.4 and 0.5. The proportion or the percent. Let's do that. the percent of p hats that are between 40 and 50, 0.4 and 0.5 are 62.45 percent. So more than half of the p hats are trapped between those two bounds. Okay, so the normal distribution is really user-friendly for plugging in variables and getting plugging in cut points. The next, we're going to now switch though, we're not going to be using the normal distribution. We're not going to be using this. We're now going to do almost the same thing, but we're going to be using the sampling distribution for proportion. And we're going to have the same center and the spread is going to be the same because we're keeping the same sample size. of 100. So it's going to look very similar to this, but we're using sampling proportion instead. So I'll say goodbye to this one, and I will say hello to the sampling proportion. It's right here. And we want to put in this value so we're going to have to use enter i'm not going to be able to get that precise so we'll get the center of 0.424 all right 0.424 um And we're going to have a sample size of 100. So the sample size of 100 goes in there. I guess I switched it around a little bit. So sample size center. Okay. So we want a sample size of 100. And we are going to do simulation. So the proportion of the simulated sample is less than 35. So it's just like what we did up top. So we want to have 35. So we're going to, how many simulations did they ask us to do? It's asking you, if you read this, it's saying generate a thousand random samples of size 100. So that means. that you're going to want to ask for the thousand, the 10,000. So you're going to go way over to the right. So I'm going to hit it. It's telling it to do 10,000. And I always forget to do this. I'm going to now draw that 10,000 and I get, here's my, hasn't done it yet. Okay. I did it once. interesting oh so i'll hit reset i got confused come on 10 000 okay so we got it on 10 000 and we're going to hit draw so it's a clean slate hit draw and we've got a whole bunch of p hats and there they are so um what proportion of the simulated sample is less than point having a p hat of less than 0.35. How does this value compare to answer 7a? So 7a is right here. We got, we got this right here of 66.71% of The approximate normal distribution. That was the answer we got for approximate normal. So how are we going to do this on this one? The way we do it. is we're going to, figure this out, the directions are written here. You want to enter 3.5 in right here. What proportion? Enter 3.5 in. at or below this value. So before you do that, you have to hit find percent of the sampling distribution. So I don't know if I said to do that. I think that box. So anyway, it's right. I should have told you this, but go up here to. I'm having all kinds of technical difficulties here. No, no, back, back, back. So you're going to find percentile. And that's where it says which percent, no, not the one I wanted. Find the probability. Okay, so there we go. So that's the adder below. And then you're going to enter in the 35%, the 0.35. And let's see if it marks it. It doesn't mark it, but it does work it out for you. And it figured out if you look down here, so you've got orange eyeball right there. If you have a p-hat. 0.35, which is the first where it says value, you slide all the way over to the left, you see that I got, and you're going to get something different for p hat equal to 0.35, the probability of smaller p hats. equals 0.0783. So it's saying almost an 8% chance, and we got 6.71. So that's not fantastic, to be honest with you. And I'm checking to make sure I didn't make a mistake. But this is my sampling distribution, and every sampling distribution gives you a slightly different answer. but I didn't do anything wrong. And you probably have something close to it. So this is probably more precise because the other is an approximation, but it's really close. It's really, really close. So now the next one, what is the proportion of the simulated sample that is greater than 45%? So I'm going to use our, so what we did here was the probability that P hat was less than 0.35 or less than or equal to, but actually less. Oh, wait a minute. Did I portion less than 0.35? Is that built in? Well, we're treating it like a value below. Yep. So it's built in. Okay. So the next one here. greater. So we're looking for the probability that our p hat is greater than 0.45. Okay. So we don't really have a value for that because this distribution, this computer software package only does the less than. So I'm going to get, I'm going to change this right here to 0.45. I'll do that for sure. And if you look up here, you're interested in bigger than 0.45. But unfortunately, this distribution is only going to give you less than. So below 0.45 is giving us... um 73 so it's telling you if we use the sketch up above it's telling you that this value right here is 73.07%. So it's going to be, I know that the leftover is going to be one minus that. So I'm going to go equals one minus the value that they gave me. This is the greater than, I'm sorry, the less than value. proportion of p hats less than 0.45. Okay, so that ends up being 1 minus 73.07. And just also to make this even more fun, is your answer is going to be slightly different than mine because every sampling distribution has a slightly different result. So I'm going to make sure I did that right. 1 minus 0.7307. I hope I selected 10,000. Yes, I did. Good. 10,000. Good. Okay. Equals. So, so many things can go wrong. So to get that red area right here, it's going to be 0.2693. That's what I got. That's what I got. How does that compare to what they what we did with the normal distribution with the normal distribution? We had this right here, that's actually pretty darn close, isn't it? So it worked out pretty well. But it was a lot more work to get the question here. And for the next one, the middle point, what you're going to have to do is you will have to do the lower value of the first bound and then the lower value of the second bound and then somehow you're gonna have to do a lot of math to get that right and i'm just not gonna do it i know that the approximation is pretty good so i'm gonna cheat on this one because this video is probably taking way long too many steps too many steps We know it's pretty close to that answer, 62.45%. Because the normal approximation is really good at approximating the probabilities. So... we're probably going to be stepping away from the binomial distribution. It is the mama of the, it's often the more precise, but it's cumbersome. And if you get a big enough sample size, it's all going to work out fine. All right. So I basically didn't answer this question. I just skipped it. Okay. So number nine, now we're revisiting our example. And we're saying, okay, after a year of being in, being on this nutrition program within that clinic, um, they did a survey and they found that out of 500 people, out of 500 people, they found that 203, um, were obese. So that's a lot of people. I don't know what that p-hat is. I think I want to work that out right now. So the p-hat, number of successes over the total sample size, which was 500, which by the way, it's not the distribution we were working with. This distribution was 100. That's a different distribution. Every sample size has a different standard deviation. So this p hat is going to equal, and I'm going to go to four places past to kind of mimic what we're getting at the, I don't want to do that. Why not? So I'm going to go four places past. So 203 divided by 500. Oh! It decreased 0.406. So for their sample proportion, instead of the P is 0.424, there was a reduction. Less people are obese. So that's putting a smile on my face. I'm going to be happy that maybe I'm one of them. So assume that the true proportion of the clinic who are obese is, so we're assuming that this is what. it was P equaled this. This is the people who were not participating in the program. What is the mean and standard deviation of the distribution when you have a sample size of 500? So from the law, from the central limit theorem, we know the center, the mean center for P hat is going to be identical. It's going to be the true population proportion. 0.424. And we know that the standard deviation, um, standard deviation, sigma, and it really is a sigma. It's not a standard error. Um, oh, it's not given to us, but we can calculate it. It's the square root of P one minus P over N. So It's going to be P, 1 minus P, and down below is going to be N. So P is 0.424, 1 minus 0.424, and then N. Oh, now we looked at 500 people. So again, why don't you run that through your calculator to see what you get. And go to four places past, holding everything in the calculator. So first figure one minus that, then multiply it by itself, then divide by 500, hit enter. You've got a decimal there. And then save the square rooting for the very end. And you should get... you should get, and go four places past rounding, you should get 0.0221. So that's, so your, your mean is right here and your standard deviation is right here. Okay. And we could draw a sketch of it, but it's really tedious to do. So can you also assume that it's bell-shaped. Assume that the proportion, assume, um, assume that the true proportion of clinics, can we, oops, so the question is, can we use, can we model the distribution of P hats as normal? Can we say the shape is bell-shaped? Clearly answer yes or no and defend your answer using mathematical calculations. So what, how can we check? The only thing we need to worry about, well, we need to worry that it was a random sample and things like that. But the only mathematical thing we need to worry about is, was M big enough? You can only assume that shape. Shape is normal or bell-shaped. If n is big enough, and that's not just a check, that's a check check. You've got to figure out what's np and what's n1 minus p. And what's the threshold? Both the number of successes have to be greater than or equal to 10. And the number of failures have to be greater than or equal to 10. So we... We want to, if we were looking at 500 people, we're going to hope that at least 10 of them are obese. Well, my intuition, if we're looking at 500, it's almost certainly big enough, but let's do the mathematical calculations anyway. So pop in. So we know that N is 500 and we know that P is 0.424. And when we multiply that together. we get 112. Is that right? No, 212. Well, whatever it is, 212, which is clearly greater than or equal to 10. Check. And then if we do one minus, same n, right? Nothing's changed. So n is still 500. And one minus P is the flip of the 42%, one minus 0.424. When you run that through, you should get 288. And that's clearly greater than 10. So can we be sure? So I did my mathematical checks. So N, not capital N, that means the whole population size. N is big. enough to assume the distribution the p hat distribution is normal okay we can be sure of that um so if we want to you know i'm happy to just draw a tiny little sketch 0.424 and it's got that nice shape and so p hat normal enough use the normal distribution if the true proportion is really this then in approximately 95 percent of all random samples of 500 the sample proportions you who are obese will fall between one and one. So we're asking about 95% of all the data points. How many standard deviations do you have to go? Not one, but two standard deviations. Not one, but two standard deviations. This is your lower bound and this is your upper bound where you trap the inner 95%. So this right here is going to be the center mu minus two sigma and this right here is going to be mu plus two sigma. So mu being the center. and sigma is right here. So it's going to be 0.424 minus two times so many decimals. um where is it oh it's right here 0.0221 i need this right now and then this one will be a little easier to see this is 0.424 plus two times 0.0221. So when you run those through the calculator, you should get that the thresholds, little niggly, but the thresholds are, I'm going to just check it. 0.0 221 times two so plus 0.424. Okay so I got here for this threshold right here, right here, I got 0.4682. And for this threshold, when I subtract two standard deviations, instead of adding this one right here, I got 0.379. eight. So it's 0.37 and 0.4682. So we would expect that the true average for our clinic is 42.4% obese, give or take. about two percent in both directions right here give or take two percentage points in both directions so if i want to trap 95 since i know it's that nice bell-shaped curve i don't go out two in both directions i go about four in both directions and that checks out so i'm doing it because my arithmetic i can you estimate whole numbers a lot better. So I'm doing that. And so that checks out. So the question here is, if we look at the results that we were so happy about, these results right here, which means that p hat equals 0.406. Well, 406 is, they're a little thinner, but maybe it's about right here, 406. Does that seem like it's sure that the true average isn't 42.4%? Consider the answer in B and C, and then consider the p-hat that you gathered. after a year, do the data after one year provide strong evidence that the percentage of patients in the clinic who are obese and who participated in this study is less than the national average? Defend your answer. So I can look at this and say, well, I don't think that's too far away from the center. So it says using a statistical explanation. So it's pretty broad how to do this, but how can I show that a score is unusual or not unusual? So explanation one. So clearly answered yes or no. I'm going to say the data is not surprising. The observation p hat equals equal to 0.406 is not very surprising if the true center of obese people has remained. um p equal 0.424 so i'm i'm just saying i'm not that impressed with that result i'm not going to give that clinic a bunch of money to to fight obesity even it's two percentage points off not more not much uh not much more than two percentage points off and the standard deviation is two percentage points so um i don't think the z score is gonna oh hmm Whoops. So how, so I've stated, this is my statement. I need to defend my answer. So one way to defend it is, um, answer using statistical language. The bet, my favorite way is Z score. Um, the Z score for P equal to, for this observation. for this. p hat is so z equals observation minus center over spread and know that the observation is point four zero six the spread the center i should say is point four two four And the standard deviation is 0.0221. So what's that C score? It's black p hat. So 0.406 minus 0.424. Enter. So it's negative. I get a negative 0.018 divided by 0.0221. What's that going to be? Negative 0.814. It keeps going, but I'm going to say it's about that. I'm going to round. I think for z-scores you can actually round to just two decimal points, but as you can see, I eyeballed it pretty well. It's less than one standard deviation from the center. If we look here, that's one standard deviation. So I'm going to say this is less than one standard deviation from the mean and therefore is not an unusual result for the true parameter equal to 0.424. So, yeah, it was kind of exciting that it was definitely lower than, you know, it seemed like it was promising. But when you put it in the actual sampling distribution and you acknowledge the variation that occurs, it's not that surprising. Okay, so last thing we're going to do is talk about sample size. So intuitively, if you get a... bigger, bigger sample, you've got a less variation of the p hats. And that's a good thing. But sometimes a researcher will come to you and say, I want to make sure that the variability is a certain, I don't want there to be so much fluctuation. And this is an example of that. I'm going to show you how to do this. And then later, if you are like, I don't have the math skills where I could do this on a test, I am going to show you. technology, but to the homework that's coming up, you have to do it algebraically. So here we go, or mechanically. It's not really algebra. Okay, the large medical clinic would like to do a follow-up of the study. So they're going to maybe increase the sample size. And part of the reason that that result wasn't so interesting was because the variation was pretty high, two percentage fluctuation points. So follow-up study, but would like the standard deviation, would like the standard deviation for the p-hats to be no more, no higher than 1%. So we had about 2%. Assuming, assume again that the true proportion is really... equal to 0.42. How many individuals would you need to ensure that the standard deviation of the sample proportions would be 1%? Okay. So, well, I really don't know the answer to this off that I know it's going to have to be more than 500 because 500, a sample size of 500. led to a standard deviation of 2%. So here's our standard deviation was 2% when we had a sample size of 500. So we're going to have to increase that. So maybe I'll start with the formula. So I know that in general, the standard deviation sigma, which is the standard deviation, is equal to this formula, P. One minus P over N. Okay. And what they're saying is how many individuals that is N. That's code for what's a sample size? Sample size. Okay. So we're saying what should the sample size be so that we get. the whole thing is less than 1%. So this is what it equals. We want it to be less than 1%. So we want this to be less than 1%. Okay. So what I'm going to do is I want it to not equal, but be less than. So I'm going to say less than. So this observation right here, all right, so we want sigma to be less than 0.01. Okay, so we know that sigma is this formula. p one minus p over n and we want it to be less than or did they say equal to no higher than so less than or equal okay so that's a little one percent okay so well we know what p is don't we We know what P is. P is still 0.42. So I'm going to pop that in, 0.424. And then this is going to be 1 minus 0.424. Okay. And it's really the N that we don't know. So if you had a happy experience. in childhood with, so it's got to be less than or equal to 0.01. Well, this just becomes, huh, just algebra. We just need to solve for n. We just need to find. the n that works and we want the first n that works. So you're going to have to do some algebra to do this and the steps are going to be the same. You want to get n by itself so and then trust me later there's going to be some some software for this. But everything the first thing I notice this square root sign is messing everything up. So I wish that wasn't there. So I'm going to go ahead and I'm going to get rid of the square root sign by squaring both sides. And so if I square a square root, I kind of wipe it out. So what I'll be left with then is just 0.424 times 1 minus 0.424 over n. I wiped the square root sign out. has to be greater than or equal to, and I can square this side and I know the relationship's going to stay the same because there's no negatives floating around. So what is 0.01 times 0.01? It's going to be 0.0001. Kind of shocking that when you multiply a small number by itself, it gets even smaller. So I now want to solve this and I... There's lots of things. I'm going to put a one under this because. Now, and I know that a lot of you have been taught to cross multiply. You don't want to do that for this situation because you don't know where's the less than. So I'm going to use the golden rule, which is whatever I do to one side, as long as I do the same thing to the other side, everything's good. But I'm also, can I figure out what this is right here? Can I just work this out? I think that'll look a lot nicer. I'm going to clean up all the way along. So clean up both sides. So point, so first I'm going to do the one minus 0.42, one minus 0.424. And I got for this part, I got 0.5. but I'm going to hold it in my calculator and I'm now going to multiply it by 0.424. So that's what I got. I'm going to hold it 0.244224 over m is less than or equal to 0.0001 over 1. Okay, so I need to solve for n to figure this out. And I don't want to cross multiply because a lot of people don't even know what that is or where it comes from. It gets you into a lot of trouble. I'm going to multiply both sides by n because the n's on the bottom and that is not a good thing. So it cancels here and I get 0.244. 2, 2, 4, which is still in my calculator, still there waiting for me, equals 0.001 times m. So I wish that that 0.001 wasn't there. So to get rid of it, it's multiplying. So to kill a multiplying number, I don't want to do that. What you do is you just divide by 0.001 and that will cancel. And whatever you do to one side, you have to do to the other side. Oh, it's threes, three zeros. And so I get N because those cancel and I just get my lovely black N on this side. Cancel, cancel, 0.00. I'm going to make sure I don't drop a decimal. So I'm going to take it's still in my calculator. I'm going to now divide it by 0.0001. And I get n is 2,442.24. So if I have exactly 2,000, so I 500. my standard deviation was a little more than 2%. The company didn't like that. And I'm going to go back to them and say, if you want to have a fluctuation, a give or take, standard deviation is going to be instead of a typical value fluctuates 1% from the center, it's now going to fluctuate 2%. Sorry, 2%. Now it's going to fluctuate 1%. You're going to have to increase your sample size to 2,442.24. Only a scientist would say that because can you have 0.24 of a human being? No. So the question is, you can't have a fraction of a person that would be a dead person. So the question is, is it going to be 0.2442 people or 2443 people? Which is it going to be? Well, you could say, well, rounding, if I round this to a whole number, it's going to be this one, but you would be wrong because what this is saying is this is the threshold that hits exactly a fluctuation of 1%. If you make that go a little bit smaller, then the standard deviation will be a little bit bigger because standard deviation grows as your sample size shrinks. standard deviation shrinks as your grant sample size grows. So you can't round down, even though that's the closer whole number. This has to be this. This is the number of people, the number of people that gets you the standard mediation you wanted. And the truth is, when you're trying to find sample size, if you get a decimal, you always round up. So tip, when finding a sample size for a set standard deviation, you... always round up. Okay. And because if you round down, even though even maybe just the tiniest little bit, you've now increased your standard deviation. You don't want to do that. Okay. So the steps for doing this, if you have this on your homework, which you will, you can look at this video again and then just replay the steps. The numbers will be slightly different. get rid of the square root sign, clean everything up, and then multiply both sides until, oh, I switched it to equal, didn't I? So if I keep that, that tells you n has to be greater than that decimal, so it has to, you've got to round up. Well, never noticed that before. Okay, so let's look at what we did in this video. We did a lot of things, but it does actually boil down to a really nice succinct. The message here is that if you are dealing with p-hats, the sampling distribution, the center will be the true p. The standard deviation, sigma, is given by this formula. Okay, and the shape is going to be normal as long as n is big enough. And the threshold for big enough is that np and n1 minus people have to be greater than 10. So let's see if we covered all of this. As the sample size increases, the standard deviation decreases. So we spent a lot of time on that. For large samples, the sampling distribution can be assumed to be normal. And by large samples, we now know that's n p is greater than or equal to 10 and n1 minus p is greater than or equal to 10. So that's what large enough means. Oops. Okay. And we just did a very messy look at how to do, how to find the required sample size. It's a whole lot of algebra. Follow those steps at the end. There'll be one problem like that. And determine if the normal approximation is valid. There, that's the check for valid. This is the check check. And how do you find percentiles? You use the Dana Center tool, DCMP tool. And the sampling distribution tool is a bit of a pain. It only gives you the lower tail, but the normal distribution will give you the upper tail. It'll give you the center part. It's a lot easier to deal with. So you're hoping that your sample size is big enough and you can just go to the normal distribution. It'll make life a lot easier. Okay. So we're done. So take a little bit of a break. And then the, this was kind of a long video. The good news is that the homework, I don't think is that long. All right. Bye you guys.

Example number one, well, warm-up question. In 2017-18, the National Health and Nutrition Examination Survey estimated, oh, something shocking, which was that, oh, let me. 42.4% of American adults fit the medical definition of obese, which is overweight in a certain category. When you hit a certain level, you go from being just pleasantly plump and overweight to obese.

And it's almost 42.4, that's dangerously close to half of all Americans. Now, if we're in California, we might be more shocked by this. than other states because I do think it varies from state to state and California tends to be one of the thinner states.

The Midwest tends to be more thicker but overall we're going to treat this as the number that describes everyone. So parameter. A large medical institute medical clinic, sorry, instituted a wellness and nutrition program. So they started something just for their clients, where the patients could opt in to receive text messages with nutrition and exercise tips, or use an app to monitor their dieting and activity levels. This clinic would like to determine if after a year, So the question is, after a year on the program, the proportion of its patients who are obese is less than the national average.

So we're going to accept that this is an accurate estimate. I'm going to circle that and say P.E. equals this. That's a proportion, so not p-hat.

We're going to assume that this is the true proportion of people who are national adults. So question number one, as a statistician on a project, explain how you could conduct a study to decide if there is evidence that the rate of obesity among the clinic's patients is less than 42.4%. So we want to, we want to do now the best thing would be an experiment, but it is unethical to experiment on people, especially if you don't get their permission. So I'm going to sculpt this a little bit more. You could do an experiment, but let's not do it on the people.

Cause they might get mad because they're just there for wellness. They're just, there's it's a large medical clinic. So So think of a study, an observational study that could help you. Okay, pause and write it down.

Okay, so my suggestion here is to look at the people in the clinic. So take a random sample. of people.

Oh, no, I got to be careful because I don't want, there are children probably, of the adults in the clinic who opted into the program. So we're only interested in looking at the people who actually did the program. So take a random sample of them, wait a year, give them time.

to get exposed um and then um measure so one two three measure the proportion of the random sample are obese. And for compare that, and it's actually going to be a p-hat, to the national average, I should say national proportion. to see if that p hat is different than 42.4%.

Okay, so it's a simple enough study. Just wait a year! You know, so you probably only want to look at the people who are actually doing the program.

You want to restrict it to adults because that's what that proportion has to do with. I think the obesity level for children is different. Now, the one thing that I would suggest that could improve is what if after a year you see that 78% of them became obese, then did your program work?

No, it's significantly different maybe, but it didn't work. So let's change the word different here. is smaller. We're interested.

We're not interested in finding data that's bigger than we know it didn't work. So didn't work is unchanged or bigger, but did work is going to be smaller. Okay. And I'm going to make sure that decimal is really clear there.

So we're treating P as 42, the known proportion that describes the nation, that parameter. we're saying is 42.4%. We're going to be using in this class, we're going to be using the Dana Center tools to get a deeper understanding of connections between sample proportions p hat with how those work, how those compare to normal approximations.

So I want you to make sure that you have access to the Dana Center tools. so that you can watch this video and you can go along with the activities. It's really important that you do these activities at the same time that I do, because you're going to get different results than I am, and that actually strengthens the arguments that I'm going to be making.

So make sure that you have that. And whenever we put the information in the Dana Center tool, the p-hat and the P's should really be written as a decimal. So you move the decimal place over. So if I was using the Dana Center materials, I would make sure to put that in instead.

Okay, so after this class, you're gonna understand that as the sample size increases, as the sample size increases, the standard deviation, and if we're talking about population standard deviation, sigma of the sampling distribution of sample proportions will decrease. So as your samples get bigger, your fluctuation of the data, how much the typical data points fluctuate from the center are going to go down. But I think hopefully you already have a pretty good intuitive understanding of that. And maybe you even know the name of the theorem for that or the law. And then for larger samples, the sampling distribution of the sample proportion can be approximated using the normal distribution.

So our sample proportion, we are going to be using the only... variable that we're interested in today is p hat. And as n gets bigger, our distribution becomes more and more bell-shaped. So that's what we're going to be looking at today.

So what you'll be able to do by the end of the class is determine the required sample size for a given standard deviation, so required sample size. That's the last thing we're going to do. you're going to be able to determine whether normal distribution is valid if you can actually use it for the sampling distribution of p-hats based on the sample size. So you're going to be determining the validity. Normal distribution is valid question mark because sometimes it is and sometimes it isn't.

And you're going to be able to use the normal distribution to actually calculate percentiles involving sample proportions. And by that we mean calculating the area associated, which gives you the percentage or the probability associated with that result. So let's go. Alright, so looking at the example up here, this example. sorry, I'm getting ahead of myself.

You're going to be exploring, you're going to be comparing the sample proportions of varying values. And so let's just hop to it. Let's, let's go right, let's jump right into it.

And if this is not hot linked and this is also true in the homeworks, the preview and the practice assignments and just go to your handy dandy homepage which you should have bookmarked and we're interested in sample proportions. So that is going to have its own little section right here. Bam. And we're interested in the proportion one.

So I'm going to select that one and go ahead and get it going. All right. So it looks a little, it might look a little better on yours if you're dealing with a laptop, because unfortunately I can't see the graphs until I'm going to be doing a lot of this.

So hopefully that's not going to be the case. I hope you have a beautiful laptop and you're not doing this all on an iPad, but I have to do it on an iPad to annotate. All right.

You'll be using this tool to simulate samples of different sizes of the American adult population. So it's going to be a simulation where the sample proportion of who are obese is calculated for each sample. OK, so.

That's where we're headed. So you would like to simulate a random sample of American adults to measure the proportion who are obese in each sample. What value should you set for your population proportion?

So we're going to assume that right from the beginning, the people who attend this clinic are very similar to the people who, so this is before they get exposed to the new program that helps them think about their, their health. So they're not all obese. If they are like the American public, what proportion of them are going to be obese?

They're all, we're just looking at the adults. So it's going to be 42, 40, 42. 42.4%. That's what we said, right?

So you're going to come over here and you can't do something that's precise using the sliders. So I'm going to click here to be able to type it in. And I'm going to just type in 42.4. And it should have a little nervous breakdown. Why did it not?

I guess on the iPad, it doesn't tell you, but. Okay, so that was disappointing. Didn't have a little nervous breakdown, but it should have because it falls apart. You really need to remember. that you put it as a decimal.

Okay. So I was expecting a little error message. So 0.424.

Okay, good. So we got that. If you were to draw a random sample of Americans from samples of American adults, So I'm on this one right now. The proportion of each sample who are obese, how would you predict the shape, center, and variability of the sampling distribution of the sample proportions to change as the size increased?

So we're asking about a little bit more room. Maybe I'll slide this over. I'm not going to look at the sampling distribution right away. I'm going to, before we use technology, how would you predict the shape would be?

So what do you think the shape would be? Shape, center, and variability. And I think the best variability I'm going to use is going to be, let's just go for standard deviation, as how would those change as the sample size increases? So as...

As n increases, what happens to the shape? What happens to the center? And what happens to the standard deviation? So what I'd love is if you could write down what you think will happen.

So these are all predictions. So there's no wrong answer. So write down what you think will happen.

So some people will say the shape doesn't change. It's always normal. Some people will say that the center doesn't change. It's always right smack dab at the center.

And some people might say the standard deviation, I don't know, it changes. Maybe it gets bigger, maybe it gets smaller, maybe it stays the same. It's all, we're all just guessing here. Now, I can't guess because I actually know the answer.

But I'd like you to think about what you think it's gonna be. And then we're gonna revisit this. Okay.

So I'm going to do A and then I think, I think we'll do these all together actually. Okay. So set the population P in the tool to what you had in the question.

So I just did that. I did that already. So we've got this in here. Okay. You will need to check.

So we have to do, you had to check that little box to get three places past the decimal. And you always want to make it a decimal. Okay. So now for part A, we're going to set the sample size to one. So we're only going to select one person from the clinic.

And we're going to measure whether they're obese. And that will be our P hat. So the possible P hats, if you're only looking at one person, is they're either obese or they're not obese, right?

That one person. So your P hat is either going to be zero or one, which is a little ridiculous and a little confusing. And we're going to look at one person, put that person back in the roll sheet, and then we'll do it again and we'll do it again. And we'll do it a thousand times. So your N, is one, because you're looking at one person, and we're going to create a sampling distribution.

It's kind of a ridiculous situation, because when do you ever look at just one person in a sample? But okay, let's just do it. So we're going to change n, n here is going to go there, and we're going to change that to one.

Never likes my ones, it always thinks of them as L's. Okay, there we go. So we got the one in there and, um, So I'm not, I'm going to zero means that you are not obese. One, the success is that you are obese.

I'm not going to change the label. I'm just going to go ahead and to get comfy with this. I'm going to, I'm just going to do a couple just to remind myself of what this actually looks like. So I'm going to.

Look at one and I'm going to draw my sample and there it is. So I selected somebody and they happen not to be obese. Cool. I'm going to do it again. So I'm going to keep that person.

I've got a frequency. If you see the frequency down here, the frequency is one. One person looked at one time. I'm now going to do it another time. So draw a sample.

And oh, so. this picture right here tells me only my last proportion. So my last proportion, I did get an obese person. And so if you look down here, now, you'll see a little tiny bar graph for one for 0.0 and a little tiny bar graph for 1.0.

And that little triangle represents the most recent proportion. So it's a little confusing. because we're only looking at one person at a time.

Now they want you to look at a thousand people. So your N is going to be a thousand. That's a good color for N. This will do turquoise.

So let's go back up here and hit reset. And so it disappears. We've got no data to look at.

So we're looking at one person a thousand times, which is a little confusing. And Presto bingo, what we see, the top one is what the population looks like. We know that the population is 42% are obese and that's why it's got a probability associated with 0.42.

You can't really see it, but it's there. Look down here, you get your last result. So this is the most useless. This down here is our actual sampling distribution and what we got, if you look at the title here, you can see that the average is 0.442, which is pretty darn close to 0.424. And the standard deviation is 0.497.

And we learned last time that's really standard error. It's our approximate standard deviation. So what I would like you to do is I would like you to draw this sketch over here.

Okay, so you've got a sketch. And... pay attention to how the only possible P hat, so this is P hats, N equals one, okay? And the only possibilities are that the person is not obese, the one person you look at, or the person is obese.

So those are our possibilities. And when we look at the bars, the kind of royal blue bars, we see that out of the thousand that we looked at, about 500 and maybe 50 are not obese. And oh, well, it's 442 are obese.

So I won't put the scale there because the scale will change. But I want you to realize that that's, that's not, those are not bars. They're really, we've got 500 and well, we've got 442 people who are obese and 500 and whatever the leftover is from a thousand. So that's what that is. And that's our first probability distribution.

So the next time, and they ask you to say, be sure to label your axes and provide a descriptive title of your sketch. So it's a little tight to put it there. I'll put the title over here. Title sample proportions are let's let's abbreviate that p hats for n equal one one thousand samples.

So 1,000 little dots, 1,000 little... PCATs. Okay.

So couldn't squeeze that title in there. All right. So now we're moving on.

And instead of having N equals one, we're going to now have N equals five, which is a way better study. Instead of randomly selecting one person from the clinic, we're going to select five people from the clinic and we're going to see what proportion of them are obese. So Before we do that, I just want to, I'm going to draw that same axis here.

Out of the five people, is it possible that we could get zero people who are obese? Yes. Is it possible that we could get all five out of five obese? Yes. So that's one for a hundred percent.

And so we're talking about P hat and equals. five this time. So that means that we could get one out of five, two out of five, three out of five, four out of five, or five out of five.

So one out of five is 20%. Two out of five is 40%. Three out of five. So this is 0.2 and this is 0.4.

I'm just tracking all the possibilities. because whenever you want to do a probability distribution, sampling distribution, any kind of distribution, you want to have an idea of what's possible. So three out of five would be 60%, four out of five would be 80%, and five out of five would be that 100%.

So we've really got little tick marks here. I'll choose a different color. So this is possible. This is possible.

This is possible. This is possible. So that's possible. So I haven't looked at my distribution yet, but I'm not going to have two lines. I'm now going to have one, two, three, four, five, six possible lines.

So let's go and change it. Let's hit reset. So we've got a clean slate.

We need to change the sample size. uh, to be n equals, so the, um, the proportion doesn't change, but we're going to have this be n equals five. So I'm going to go ahead and change that to a five, and it was able to read it.

And what I'm going to do just for fun, I'm going to, just to make sure I understand what's going on. And, um, I have to say that of all the Dana Center stuff, this one is not the most intuitive. That crescent moon down there, the orange crescent moon represents that if we measured everybody in the clinic, assuming that they were just like the general population. And I'm going to select one sample.

So that's not our sample size. It's like one p-hat. So I'm actually selecting five people and I'm going to draw my sample. And what I got, that little triangle down there says. that out of five people, I got 20% obese.

Yay! 20% obese means that I basically got four people, no, one person, one success, four failures. So that's all written in the title right here, if you look at that title. And if you look down here, we're starting to create our sampling distribution. And we've got that.

One little blip because we only looked at one group of five. Now I'm going to do it again. And there's my second little blip. So now I got 60%, which was three out of five. And it's saying, if you look in the title here, it says two simulations.

It doesn't break it down for you, but up here it does. Three successes, two failures. So that's how it's working.

I'll do it one more time. Bam. And now, oh, looks like I got another, another three fat people or obese. I don't know if fat's a bad word.

Sorry. So let's now, if we, if we kept doing that, you could see it growing here, but unfortunately, I don't think I can shrink this to, it's just off. the screen. So you can see it growing. I can't.

I'm going to reset and I'm going to do instead, it asks for a thousand. So I'll just hit thousand. And that means you just looked at a thousand groups. Each group is five and you're counting up how many are obese and look, there it is.

So just as we expected, our P hats, our P hats fluctuate between zero and a hundred percent, not, and what I'm going to draw, what I get. Now what you get might be very different. So it's just an estimation.

I can see this is bigger. This is the biggest. And do you think that's surprising that, so you got maybe, oh, it looks like 300. If you look at the frequency down here, about 300 people out of your 1000 people, 300 people. were 320 looks like maybe around were actually obese.

And then we keep going, the 60% drops a bit, the 80%, a little bit more, and there's a tiny bit that was 100% obese. So that's what our distribution looks like. So the title would be P hat, P hat sampling distribution for n equals 5, still a thousand samples.

Okay, so now, so, oh, well, that distribution looks a lot better than the one up above. The one up above was ridiculously useless, really. So now we're still going to have a thousand samples, but we're going to up our sample size instead of looking at five people at a time, we're going to look at 25 people at a time. So I'm going to go ahead and draw my axes here. And I'm going to think about it.

I think still zero and one, and I want to have the same scale. But if I have, if I'm looking at 25 people, it means I could have one zero out of 25. I could have one out of 25, two out of 25. I, there needs to be 25 little tick marks along the, this. So that's a, that's a lot. So there's a whole bunch of possible answers for what p-hat could be.

So instead of having just those few 1, 2, 3, 4, 5, 6 possibilities from the previous one, there's now going to be 26 possibilities because 0, 1 out of 25, 2 out of 25. So if you want, you can try to do 26 little tick marks here. I wasn't able to do it 19, 20, 21. So I'm anticipating that this one is going to have a lot more. So that's the best I could do. I'm just going to draw a sketch of it, but I know it's going to be a lot more filled in. So I'm going to put a 25 here and I'm going to reset.

And just for fun, I'm going to do one. And I got for my first draw, I had eight people who were obese and 17 who were not obese. So that was my first one. I'll do it again. Just you don't have to, but I'd like to see them kind of grow.

Oh, I hit reset. Whoops. So that's my first one.

Now I got so every time it's different. This time I have nine. people who were overweight and 16 who weren't.

I'll do it again. Oh, got the exact same result. That doesn't happen very often. Let's see. We'll do it one more time.

Now I got a new one. And if I look up here, I get the result 14 out of 25. So that's, that's a hefty one that so if I only looked at that he had, I would think that the clinic had more. overweight people than normal. So anyway, I'm now going to do a thousand.

I'll hit reset so I only have a thousand to deal with. Draw my sample and did I draw my sample? And there it is. Oh, just like I expected.

There's quite a few little tick marks there for my possibility of 25. And notice I have no zeros and no ones. My spread goes about, and so I'm going to use the scale up above. um, my spread, I can see I've got most, I've got a fair amount at 20%.

And then I have almost nothing, just a little bit below 20%. The peak seems to be right at 40. So I'll give it a peak there. And then there's actually a little bit more afterwards.

And then it goes down from there. And my lowest point is 70. It's a little bit left of 70. So what I'm noticing, so just to get a sense of what it looks like, it doesn't have to be perfect. And you're, it's totally okay. If your results are a little different than mine, I just want you to see that it's not as spread out. The spread is, is.

It looks like I'm going to make a mark here. I'm going to guess that maybe this is, it doesn't really say, but I'm going to say 0.19 to 0.68 maybe. So it doesn't really, it's much more compact, which makes sense because if you're looking at 25, the chances of you pluck one person, chances of them being obese, that could happen.

But if you pluck 25 people, the chances of all of them being obese is practically zero. So this is what we got for N equals P hat. n equals 25 and the title is the same thing pretty much title um sampling distribution of 1 000 p hats where each time we consider 25 people. So long way of saying that n is 25. So I hope you can see what's going on here.

What do you think the next distribution is going to look like? So now we can see the pattern here. So if we have 100. So we're still, so keep the scale the same.

Okay. So we got zero to one and in the center about is, so 0.4 is about right here. So we'll just extend it down. 0.4 is about right there. Cause that seems to be an important characteristic.

And now if we're looking at a hundred people, we can get every percentage, every 10%, 20%, 30%. It's going to, there's a lot more possibilities. So let's go ahead and try that one. So we'll, we'll now we have 100. So we're going to put a hundred in here and I'm just going to make sure I've got reset. And then I'm going to draw a thousand of them.

So a thousand little P hats, each P hat is, is interviewing. Oh, and look at that. It's a lot more.

So now it's still centered around 40. 40 seems to be showing up right there. And, but it doesn't go, seems like the first little blip is around 30. So 20, so 30 is about right here. And then it has a little blip at 60. So 60, maybe it's about right there.

And it's actually a little bit, it's not, it doesn't go to 60, does it? So it's going to be, it's not perfect. I mean, is it perfectly bell shaped?

Uh, no. And yours probably isn't either. Oh, you know what? I wasn't paying attention. It kind of precipitously goes down.

So I'm going to just. do one more like this to kind of your shape is going to be different than mine but it drops off like that and then on this side so we get the idea it goes down and then it goes up again and then it just goes like that okay and let's see i'm gonna zero in are there any outliers Maybe, I mean, that doesn't look like an outlier to me on the tiny end. And on the other end, it doesn't. But maybe one of you gets an outlier on yours. But I am going to write down that it says here the mean equals 0.423.

Yours is probably slightly different. So I'm looking at the title right here. and it says that the standard deviation, and this is really standard error because it's coming from a simulation, but we get a standard deviation of 0.0513. So I kind of wish I had written the other ones down, but I didn't and it wasn't asked for. But I'll still say this is p hats, n equals 100. So we're looking at 100 people at a time.

That's good enough. I'm not going to write a title for that one. But I definitely see that it seems to be getting, it's looking more continuous. It's less jiggity-jaggity.

It's more filled in. Okay. Consider the graphs you drew in question four.

So I'm going to move this up here so that. We can look at all those graphs while we're answering. So your graphs are going to be slightly different, but explain what happened to the center.

So what happened to the center of our distribution? And I'm just going to get a red. Is the center more or less the same? Yeah, terribly written, center. They all more or less have the same center.

Center. So explain what happened to the center of your sampling distribution as the sample size increased. Did it match your prediction in question three?

So center. It doesn't change. They all have the same center. All have the same center.

And I really hope you guys are doing along with this. You'll see this isn't a fluke with mine, but all of ours will have more or less the same center. And I know this one might not look like the center, but it's a weighted average. So it is the center. It is the average.

So it might seem a little bit off, but there are less values on the one than there are on the zero. So the center truly would be right there. Okay. If it were a median, you're right, but I'm thinking of mean.

Okay. Explain what happens to the variability. So since I'm kind of gravitating towards mean, I want to use the variability of standard deviation. And they give those to us.

So for variability, you could use IQR, you could use range, or you could use standard deviation. We could do all of those actually, and it would still, the pattern would be the same. But visually, I want to do the variability. So variability, we look at this because it's kind of hard to, there we go.

This one, if I want to capture 68% of all the data, I think I probably have to go out that far in both directions to capture that major hump. For the next one. here.

And for the next one, it's, do you see that that variability is shrinking as you go along? And this variability to capture 68%, you got to go way out. So the very, as the sample size increases, if you want to capture that inner 58, 68%, for if it is normal, which doesn't make sense with the first one, but If you wanted to do the IQR, it would be the same thing. To capture 50% of the data, you've got to go out further and further for the top graphs.

So I'm going to go back and say that the variability decreases as n gets bigger, increases. So if it's going down, you're getting more precise information. And that's really nice.

Explain what happens to the shape. Okay. So for shape, what happens to the shape? So for this shape, I don't know. I don't know what to say.

That's just, that doesn't have a name. It's, oh, it's bimodal, but. shape is more like skewed uniform.

I guess I could say it's uniform. You could see a rectangle there. This one, it's jiggity jaggedy, but you're beginning to see a knight.

I wouldn't say it's beautifully bell-shaped. This one, all of a sudden, it's looking more bell-shaped, but it is more jiggity jaggedy. And this one is definitely filled in as bell-shaped. So I'm going to say... explain the shape of the sampling distribution and I'm sure your graphs are the same.

Shape becomes more bell normal and I'll say that the variable seems to all mush together. It's becoming more continuous even though it's not. As and gets bigger.

Okay so what's happening? More bell-shaped, variability decreases, and the center doesn't change as your sample size gets bigger and bigger and bigger. So what does this mean?

It means that your distribution is becoming more and more normal as your sample size increases. And that is, we've already seen that, the central limit theorem. We already said that as your sample size increases, whatever you're trying to predict gets closer and closer to the truth.

Well, that is your center. And if you go back and look here, the center here was almost 0.424. It was 0.423. It was super close to the true center.

But the central limit theorem tells you more than just that. It tells you about all three characteristics. So in class activity 9b, we learned that the expression for the mean and standard deviation, we learned some things about the formulas, but we're going to really firm it up. now combining all of this together so this is a huge summary um so this is actually a very important um section and uh it's it's summarized here in um words i want to draw a little picture over here and i'll do it it's the prettiest color i really like turquoise so p hat n big enough.

As long as your sample size is big enough, then you know the distribution for p hat. The mean, the center, is the true proportion. So we'll put p right in the middle, and it's purple because it is the true proportion.

true population proportion. So the center is the true proportion. The standard deviation So the standard deviation is given by this amazing formula.

So this is sigma. It's not, it's the population standard deviation. That sigma is given by this formula, but I'm not going to mash that in there. And as long as the sample size is big enough.

And you've seen this before. 10 failures and 10 successes. 10, 10. Check, check. Remember that? As long as your sample size, that's what big enough means.

That you have to check that NP is greater than or equal to 10 and N1 minus P. This was the number of successes. And this. is the number of failures.

As long as your disease isn't so common or so rare that you have less than 10 people who's in the pool, then you can be sure that the shape is this beautiful, normal curve. So we've got our standard deviation in both directions so that we've captured 68%. So this is what we can be sure our distribution will begin to look like. And that's what, that's what the set.

So the central limit theorem tells us a lot more than just what the center is, what the spread is, what the shape is. It, it tells us all of that. So The only really new piece of information is this is the formula for standard deviation. Okay. Everything else you have kind of, we've been exposed to that.

All right. So based on the central limit theorem, so we're going to apply this right now. Based on the central limit theorem, what is the approximate distribution of P hats, a sample proportion of Americans who are obese when the true proportion is 42.4. So we're saying what is the national, what does the distribution look like of all p hats?

So that'd be a lot to research but we know it from here. So I'm gonna, this is the whole world, so I think I'll make it purple. Actually I think I need to move this over a little bit.

Let's drag it over here a little bit. Uh-oh. See what I've done here.

I'm going to cover that up. I've been hiding this so I have a little more room to work with. That's actually erased.

I hope the people at the Dane Center aren't mad that I'm erasing there. Okay. So, gee, what's...

So we're going to do p-hatts here. They come from the data. So I'm going to make it orange. There are observations, p hat, and we have n is 100. That's our sample size. n equals 100. What do we know is the center of this distribution?

Center equals what? 0.424. That's our true p.

So I'll put it right in the center. four, two, four. And by the way, that's the P that we were given of obese people.

And this is P hat. I'll just say P hat is number, proportion of obese people in our sample, our national sample. And now let's calculate the spread.

And since we're using, and that is an average so we're going to use the standard deviation and it's it's not an approximation it's the actual one so I'm going to go ahead and use my formula it's going to be p 1 minus p over m p And we know P is 0.4241 minus P over N. that's it, but don't leave it like that. So what you want to do, and I think we did this before, is you want to first figure out what 1 minus 0.424 is.

Hit that on your calculator, keep it in. So that's the first thing you want to do. Then you're going to multiply it by 0.424 again, and hit enter, keep. all those string of decimals in your calculator and then divide by 100. Hit enter. You still got a string of decimals.

And then you're going to, the last thing you're going to do is you're going to hit the square root button and hit enter. And then, only then are you going to round. And so that we're doing exactly.

along the lines of the Dana Center, I would like you to round to four places past the decimal. So if you did that right, you should get, saving the rounding for the very, very end, you should get 0.0494, which is about almost 5%. So that tells us that we would expect to get proportions, our sample proportion, we would expect to get around 42.4%, give or take five. So I'm going to, I'm not going to be that precise.

Well, why not? Give or take five. So 0.424.

minus 0.0494. So over here, I should get one standard deviation smaller would be 0.3746. And the other way 0.424 plus 0.0494.

four, seven, eight, four. Okay. So that's my first, that's my major hump. And I'm not going to do the other, the other two.

I'm just, I don't have it in me to figure this one out. That's two standard deviations and that's three standard deviations. I just know it gently slides to being done. So I'm going to give it a peek.

I'm going to, this is my hump. And then it switches and it starts smiling. That's 95% of all data. And then almost everyone, but it does keep going.

There's always someone else who's fatter and thinner in both directions. So this should be a little smoother. I'll just smooth it out a little bit. So do the best you can to make it that beautiful symmetric bell-shaped curve. And, um...

This right here is 0.0494, which is sigma. And it's not an estimation. This is the model that all p-hats will be approaching if, only if, we have a sample size of 100. It all changes if we have a different sample size, except the 7. Okay, so that's what it looks like. What is the approximate distribution?

So I just drew it pretty awesomely. probably went a little over the top. So we spent the first part looking at sampling distribution, but the central limit theorem tells us that everything is approximately normal.

So that is the fantastic news, as long as n is big enough. Are we sure n is big enough if we have 100? Did we know for sure that n was big enough?

So this is 6. Based on the central limit theorem, I just made that assumption. Let's check. Is n really big enough? Is n big enough? for this distribution it better be because i just this, these assumption?

Well, NP is going to be 100 times 0.424, which is 0.424. Sorry, it's not 42.4, which is clearly bigger, greater than 10. Yay. And then we also want to check is 1 minus p, bigger than 10, 100 times 1 minus 0.424.

And now we'll get what that complement is. So 1 minus 0.424 is 57.576. times 100 is going to equal equals 0.576. No, equals 57.6. What's even today?

57.6, which is clearly bigger than 10. So check, check. So that's, since we have and it's big enough we can be sure that we have this beautiful shape right here. Idealized beautiful little shape curve.

Okay so since we know that it's really normal, we can go ahead and instead of using the sampling distribution, which is not that much fun, we can go ahead and we can instead use the I skipped a step here. Guess not. We can use the normal distribution and things are going to go pretty fast then.

So here it is. Go to this normal distribution. So say goodbye to the sampling distribution for each of the problems before using technology, do a little sketch.

So we want a little sketch before. So what is the approximate probability, so we'll do a little sketch, that at most 35% of the individuals sampled were obese? So we got this picture, this picture up here, but I'm just going to, I'm not going to do all that time. I'm just going to go, okay, I know that 42.4 is in the center. And this 38 that I got, this P hat is going to be me.

I don't know. Right about here, maybe 0.35. Maybe we'll get right about here.

three, five. So they're asking, what is the approximate probability? So we want the probability. Write the area as a percent exactly as it appears on the graph.

So I know that this right here is my cut point. This is the P hat. And I want to know at most 35%, at most 35%, does that mean that 35% is the lowest point or the highest point? If I am at most 35%, if at most 35% of the people that I select.

are obese, does that mean 34, 33, or does it mean 37, 38? Which way do we go? Well, we go this way. It's at most means up to and include and including.

So the approximate probability is going to be this shaded area. That's the percent of P hats that are at most, p hat equal to 0.35. Okay, so I've got my, so I'm going after that area. So it's going to be that area.

And so now I'm just going to use my lovely Dana Center, but I'm going to say goodbye to sampling distribution. And I'm going to say hello to normal distribution. And all I need to do is I need to make sure to put in the proper. center of my distribution and the proper spread of my distribution. So the center, so I'm going to define probability and my center is 0.424.

Didn't read that. Okay, 0.424. Yes.

And the spread. is point, I think I'll just give up on that and then we'll hit the key. Okay.

So this one is the spread is 0.0494. It's unfortunate that it's so similar, but those are two different numbers. So all that hard work. They drew the distribution that I was working on over here. They just drew it in a snap and it is right, but all I need is a sketch.

So, oh, and the value I want is not 52.08%, but the value I want is 35%. So I'm doing decimals all along the edge. So I have to be consistent here. So I'm going to do.

0.35 and it's so you see it read it as 35 it's trying to accommodate me but I'll just come over here 0.35 hopefully oh no that didn't work either 0.35 okay so now does the sketch look like so you kind of want to do your own sketch first so that you can catch yourself if you made a mistake so I It's all looking good so far. And it calculated the area for me. And it got that you would expect to see that 60, 6. So how do you want to do this? It says here, write the area as the percent exactly as it appears on the graph. And I want to write a sentence.

You don't have to, but. The chances of getting a p-hat less than or equal to 0.35 is 6%. point seven one percent that's the answer and it's this shaded region right here um and i translated at most into that so the answer you could just put uh six point seven one percent if you want all right so i'm going to pause now and i want you to do um the same thing for the second and the third question you So for B and for C, go ahead and work those out, see how you do. And then I'll work them out and we'll see if you got it. All right.

And again, please draw a little sketch first for each one of them. A little sketch, then go to technology so that you can catch yourself if there's some weird switching that goes on. Okay.

I hope you paused and did that. a minus quick sketch. I know my distribution has a center like that. I know this is 0.424. And what's the approximate that p-hat is bigger than 0.45?

So 0.45 is my cut point, so I don't know where it is. I'll just say it's right there, 0.45. And we're interested in the p-hats that are bigger.

So it's going to be all of these. So the area is going to equal the probability. So I'm interested in that is the percent of p hats.

bigger than p hat equal to 0.45. And notice, I don't have to worry about equality because it's a continuous variable, and that exactly equal is 0%. So I just need to figure that out.

I've got my sketch. I'm going to come over here. All I need to do, I've got all this stuff in there the way I want.

I want it to be an upper tail. So I'm going to switch that to an upper tail, maybe. It's not doing it for me, which is interesting. Upper tail. I had this problem last time.

Okay, there. Now it's an upper tail, and I need to put the cut point of 45.45. Let's hope it read it.

And that looks pretty good. My 45 was a little closer, a little further away from the center, but I'm convinced that that's okay. And it's going to be 70.7, 0.07% of peat hats are bigger.

So, but you could just. that all you need is the um the uh decimal so this is a problem where are we i'm not referring to my notes and i'm getting lost okay okay Oh, wait a minute. I'm glad I looked at my notes.

I didn't, did I, am I reading it right? Am I, look at that. That area sure looks like less than half.

And I just went ahead and read the first number that came to me. I got to fix that. For the upper tail, it's really. change that.

I'm going to put a different color in 29.93%. So it's all of these, my bad. Oh, I'm glad I caught it.

And I should have been using my sketch to guide my way, but it's less than a third. So use that sketch to help you. All right.

So the next one, what's the probability that it's between? So you know that you've got that because it's a normal distribution. because n is big enough in between you know 0.424 is at the center we want a little bit less than 40. I have no idea if that's a little bit less than 40 but it's just a sketch 0.40 and then 0.50 looks like it's over here 0.50 and we're interested in the area in the middle we're interested in this.

And so that's going to be a middle tail. And A, the smallest value is 0.4, not 4, 0.4. Okay, I have to resort to that, 0.4. And B is 0.5. And sure enough, that sketch lines up quite nicely.

I don't know how to get rid of that, though. Here we go. And so it looks like the proportion.

of p hats between 40 and 50 are and you know what i'm gonna um i'm gonna keep my p hats as decimals that's how i help my that's otherwise i get a little proportion i get a little confused about what's what So I'm going to keep decimals on the axis. So 0.4 and 0.5. The proportion or the percent. Let's do that.

the percent of p hats that are between 40 and 50, 0.4 and 0.5 are 62.45 percent. So more than half of the p hats are trapped between those two bounds. Okay, so the normal distribution is really user-friendly for plugging in variables and getting plugging in cut points.

The next, we're going to now switch though, we're not going to be using the normal distribution. We're not going to be using this. We're now going to do almost the same thing, but we're going to be using the sampling distribution for proportion. And we're going to have the same center and the spread is going to be the same because we're keeping the same sample size.

of 100. So it's going to look very similar to this, but we're using sampling proportion instead. So I'll say goodbye to this one, and I will say hello to the sampling proportion. It's right here. And we want to put in this value so we're going to have to use enter i'm not going to be able to get that precise so we'll get the center of 0.424 all right 0.424 um And we're going to have a sample size of 100. So the sample size of 100 goes in there. I guess I switched it around a little bit.

So sample size center. Okay. So we want a sample size of 100. And we are going to do simulation. So the proportion of the simulated sample is less than 35. So it's just like what we did up top.

So we want to have 35. So we're going to, how many simulations did they ask us to do? It's asking you, if you read this, it's saying generate a thousand random samples of size 100. So that means. that you're going to want to ask for the thousand, the 10,000. So you're going to go way over to the right. So I'm going to hit it.

It's telling it to do 10,000. And I always forget to do this. I'm going to now draw that 10,000 and I get, here's my, hasn't done it yet.

Okay. I did it once. interesting oh so i'll hit reset i got confused come on 10 000 okay so we got it on 10 000 and we're going to hit draw so it's a clean slate hit draw and we've got a whole bunch of p hats and there they are so um what proportion of the simulated sample is less than point having a p hat of less than 0.35. How does this value compare to answer 7a? So 7a is right here.

We got, we got this right here of 66.71% of The approximate normal distribution. That was the answer we got for approximate normal. So how are we going to do this on this one?

The way we do it. is we're going to, figure this out, the directions are written here. You want to enter 3.5 in right here. What proportion? Enter 3.5 in.

at or below this value. So before you do that, you have to hit find percent of the sampling distribution. So I don't know if I said to do that.

I think that box. So anyway, it's right. I should have told you this, but go up here to. I'm having all kinds of technical difficulties here.

No, no, back, back, back. So you're going to find percentile. And that's where it says which percent, no, not the one I wanted.

Find the probability. Okay, so there we go. So that's the adder below. And then you're going to enter in the 35%, the 0.35.

And let's see if it marks it. It doesn't mark it, but it does work it out for you. And it figured out if you look down here, so you've got orange eyeball right there. If you have a p-hat. 0.35, which is the first where it says value, you slide all the way over to the left, you see that I got, and you're going to get something different for p hat equal to 0.35, the probability of smaller p hats.

equals 0.0783. So it's saying almost an 8% chance, and we got 6.71. So that's not fantastic, to be honest with you. And I'm checking to make sure I didn't make a mistake. But this is my sampling distribution, and every sampling distribution gives you a slightly different answer.

but I didn't do anything wrong. And you probably have something close to it. So this is probably more precise because the other is an approximation, but it's really close.

It's really, really close. So now the next one, what is the proportion of the simulated sample that is greater than 45%? So I'm going to use our, so what we did here was the probability that P hat was less than 0.35 or less than or equal to, but actually less. Oh, wait a minute. Did I portion less than 0.35?

Is that built in? Well, we're treating it like a value below. Yep.

So it's built in. Okay. So the next one here.

greater. So we're looking for the probability that our p hat is greater than 0.45. Okay.

So we don't really have a value for that because this distribution, this computer software package only does the less than. So I'm going to get, I'm going to change this right here to 0.45. I'll do that for sure. And if you look up here, you're interested in bigger than 0.45.

But unfortunately, this distribution is only going to give you less than. So below 0.45 is giving us... um 73 so it's telling you if we use the sketch up above it's telling you that this value right here is 73.07%.

So it's going to be, I know that the leftover is going to be one minus that. So I'm going to go equals one minus the value that they gave me. This is the greater than, I'm sorry, the less than value.

proportion of p hats less than 0.45. Okay, so that ends up being 1 minus 73.07. And just also to make this even more fun, is your answer is going to be slightly different than mine because every sampling distribution has a slightly different result.

So I'm going to make sure I did that right. 1 minus 0.7307. I hope I selected 10,000.

Yes, I did. Good. 10,000. Good. Okay.

Equals. So, so many things can go wrong. So to get that red area right here, it's going to be 0.2693.

That's what I got. That's what I got. How does that compare to what they what we did with the normal distribution with the normal distribution?

We had this right here, that's actually pretty darn close, isn't it? So it worked out pretty well. But it was a lot more work to get the question here.

And for the next one, the middle point, what you're going to have to do is you will have to do the lower value of the first bound and then the lower value of the second bound and then somehow you're gonna have to do a lot of math to get that right and i'm just not gonna do it i know that the approximation is pretty good so i'm gonna cheat on this one because this video is probably taking way long too many steps too many steps We know it's pretty close to that answer, 62.45%. Because the normal approximation is really good at approximating the probabilities. So...

we're probably going to be stepping away from the binomial distribution. It is the mama of the, it's often the more precise, but it's cumbersome. And if you get a big enough sample size, it's all going to work out fine.

All right. So I basically didn't answer this question. I just skipped it.

Okay. So number nine, now we're revisiting our example. And we're saying, okay, after a year of being in, being on this nutrition program within that clinic, um, they did a survey and they found that out of 500 people, out of 500 people, they found that 203, um, were obese. So that's a lot of people.

I don't know what that p-hat is. I think I want to work that out right now. So the p-hat, number of successes over the total sample size, which was 500, which by the way, it's not the distribution we were working with.

This distribution was 100. That's a different distribution. Every sample size has a different standard deviation. So this p hat is going to equal, and I'm going to go to four places past to kind of mimic what we're getting at the, I don't want to do that.

Why not? So I'm going to go four places past. So 203 divided by 500. Oh!

It decreased 0.406. So for their sample proportion, instead of the P is 0.424, there was a reduction. Less people are obese.

So that's putting a smile on my face. I'm going to be happy that maybe I'm one of them. So assume that the true proportion of the clinic who are obese is, so we're assuming that this is what. it was P equaled this.

This is the people who were not participating in the program. What is the mean and standard deviation of the distribution when you have a sample size of 500? So from the law, from the central limit theorem, we know the center, the mean center for P hat is going to be identical.

It's going to be the true population proportion. 0.424. And we know that the standard deviation, um, standard deviation, sigma, and it really is a sigma.

It's not a standard error. Um, oh, it's not given to us, but we can calculate it. It's the square root of P one minus P over N.

So It's going to be P, 1 minus P, and down below is going to be N. So P is 0.424, 1 minus 0.424, and then N. Oh, now we looked at 500 people.

So again, why don't you run that through your calculator to see what you get. And go to four places past, holding everything in the calculator. So first figure one minus that, then multiply it by itself, then divide by 500, hit enter. You've got a decimal there. And then save the square rooting for the very end.

And you should get... you should get, and go four places past rounding, you should get 0.0221. So that's, so your, your mean is right here and your standard deviation is right here.

Okay. And we could draw a sketch of it, but it's really tedious to do. So can you also assume that it's bell-shaped.

Assume that the proportion, assume, um, assume that the true proportion of clinics, can we, oops, so the question is, can we use, can we model the distribution of P hats as normal? Can we say the shape is bell-shaped? Clearly answer yes or no and defend your answer using mathematical calculations.

So what, how can we check? The only thing we need to worry about, well, we need to worry that it was a random sample and things like that. But the only mathematical thing we need to worry about is, was M big enough?

You can only assume that shape. Shape is normal or bell-shaped. If n is big enough, and that's not just a check, that's a check check. You've got to figure out what's np and what's n1 minus p. And what's the threshold?

Both the number of successes have to be greater than or equal to 10. And the number of failures have to be greater than or equal to 10. So we... We want to, if we were looking at 500 people, we're going to hope that at least 10 of them are obese. Well, my intuition, if we're looking at 500, it's almost certainly big enough, but let's do the mathematical calculations anyway.

So pop in. So we know that N is 500 and we know that P is 0.424. And when we multiply that together. we get 112. Is that right? No, 212. Well, whatever it is, 212, which is clearly greater than or equal to 10. Check.

And then if we do one minus, same n, right? Nothing's changed. So n is still 500. And one minus P is the flip of the 42%, one minus 0.424. When you run that through, you should get 288. And that's clearly greater than 10. So can we be sure? So I did my mathematical checks.

So N, not capital N, that means the whole population size. N is big. enough to assume the distribution the p hat distribution is normal okay we can be sure of that um so if we want to you know i'm happy to just draw a tiny little sketch 0.424 and it's got that nice shape and so p hat normal enough use the normal distribution if the true proportion is really this then in approximately 95 percent of all random samples of 500 the sample proportions you who are obese will fall between one and one.

So we're asking about 95% of all the data points. How many standard deviations do you have to go? Not one, but two standard deviations. Not one, but two standard deviations. This is your lower bound and this is your upper bound where you trap the inner 95%.

So this right here is going to be the center mu minus two sigma and this right here is going to be mu plus two sigma. So mu being the center. and sigma is right here. So it's going to be 0.424 minus two times so many decimals. um where is it oh it's right here 0.0221 i need this right now and then this one will be a little easier to see this is 0.424 plus two times 0.0221.

So when you run those through the calculator, you should get that the thresholds, little niggly, but the thresholds are, I'm going to just check it. 0.0 221 times two so plus 0.424. Okay so I got here for this threshold right here, right here, I got 0.4682.

And for this threshold, when I subtract two standard deviations, instead of adding this one right here, I got 0.379. eight. So it's 0.37 and 0.4682.

So we would expect that the true average for our clinic is 42.4% obese, give or take. about two percent in both directions right here give or take two percentage points in both directions so if i want to trap 95 since i know it's that nice bell-shaped curve i don't go out two in both directions i go about four in both directions and that checks out so i'm doing it because my arithmetic i can you estimate whole numbers a lot better. So I'm doing that. And so that checks out.

So the question here is, if we look at the results that we were so happy about, these results right here, which means that p hat equals 0.406. Well, 406 is, they're a little thinner, but maybe it's about right here, 406. Does that seem like it's sure that the true average isn't 42.4%? Consider the answer in B and C, and then consider the p-hat that you gathered.

after a year, do the data after one year provide strong evidence that the percentage of patients in the clinic who are obese and who participated in this study is less than the national average? Defend your answer. So I can look at this and say, well, I don't think that's too far away from the center. So it says using a statistical explanation.

So it's pretty broad how to do this, but how can I show that a score is unusual or not unusual? So explanation one. So clearly answered yes or no. I'm going to say the data is not surprising.

The observation p hat equals equal to 0.406 is not very surprising if the true center of obese people has remained. um p equal 0.424 so i'm i'm just saying i'm not that impressed with that result i'm not going to give that clinic a bunch of money to to fight obesity even it's two percentage points off not more not much uh not much more than two percentage points off and the standard deviation is two percentage points so um i don't think the z score is gonna oh hmm Whoops. So how, so I've stated, this is my statement.

I need to defend my answer. So one way to defend it is, um, answer using statistical language. The bet, my favorite way is Z score. Um, the Z score for P equal to, for this observation.

for this. p hat is so z equals observation minus center over spread and know that the observation is point four zero six the spread the center i should say is point four two four And the standard deviation is 0.0221. So what's that C score? It's black p hat.

So 0.406 minus 0.424. Enter. So it's negative. I get a negative 0.018 divided by 0.0221. What's that going to be?

Negative 0.814. It keeps going, but I'm going to say it's about that. I'm going to round.

I think for z-scores you can actually round to just two decimal points, but as you can see, I eyeballed it pretty well. It's less than one standard deviation from the center. If we look here, that's one standard deviation. So I'm going to say this is less than one standard deviation from the mean and therefore is not an unusual result for the true parameter equal to 0.424. So, yeah, it was kind of exciting that it was definitely lower than, you know, it seemed like it was promising.

But when you put it in the actual sampling distribution and you acknowledge the variation that occurs, it's not that surprising. Okay, so last thing we're going to do is talk about sample size. So intuitively, if you get a... bigger, bigger sample, you've got a less variation of the p hats. And that's a good thing.

But sometimes a researcher will come to you and say, I want to make sure that the variability is a certain, I don't want there to be so much fluctuation. And this is an example of that. I'm going to show you how to do this. And then later, if you are like, I don't have the math skills where I could do this on a test, I am going to show you. technology, but to the homework that's coming up, you have to do it algebraically.

So here we go, or mechanically. It's not really algebra. Okay, the large medical clinic would like to do a follow-up of the study. So they're going to maybe increase the sample size. And part of the reason that that result wasn't so interesting was because the variation was pretty high, two percentage fluctuation points.

So follow-up study, but would like the standard deviation, would like the standard deviation for the p-hats to be no more, no higher than 1%. So we had about 2%. Assuming, assume again that the true proportion is really... equal to 0.42.

How many individuals would you need to ensure that the standard deviation of the sample proportions would be 1%? Okay. So, well, I really don't know the answer to this off that I know it's going to have to be more than 500 because 500, a sample size of 500. led to a standard deviation of 2%.

So here's our standard deviation was 2% when we had a sample size of 500. So we're going to have to increase that. So maybe I'll start with the formula. So I know that in general, the standard deviation sigma, which is the standard deviation, is equal to this formula, P.

One minus P over N. Okay. And what they're saying is how many individuals that is N.

That's code for what's a sample size? Sample size. Okay. So we're saying what should the sample size be so that we get. the whole thing is less than 1%.

So this is what it equals. We want it to be less than 1%. So we want this to be less than 1%. Okay.

So what I'm going to do is I want it to not equal, but be less than. So I'm going to say less than. So this observation right here, all right, so we want sigma to be less than 0.01.

Okay, so we know that sigma is this formula. p one minus p over n and we want it to be less than or did they say equal to no higher than so less than or equal okay so that's a little one percent okay so well we know what p is don't we We know what P is. P is still 0.42.

So I'm going to pop that in, 0.424. And then this is going to be 1 minus 0.424. Okay. And it's really the N that we don't know.

So if you had a happy experience. in childhood with, so it's got to be less than or equal to 0.01. Well, this just becomes, huh, just algebra.

We just need to solve for n. We just need to find. the n that works and we want the first n that works.

So you're going to have to do some algebra to do this and the steps are going to be the same. You want to get n by itself so and then trust me later there's going to be some some software for this. But everything the first thing I notice this square root sign is messing everything up. So I wish that wasn't there.

So I'm going to go ahead and I'm going to get rid of the square root sign by squaring both sides. And so if I square a square root, I kind of wipe it out. So what I'll be left with then is just 0.424 times 1 minus 0.424 over n.

I wiped the square root sign out. has to be greater than or equal to, and I can square this side and I know the relationship's going to stay the same because there's no negatives floating around. So what is 0.01 times 0.01? It's going to be 0.0001.

Kind of shocking that when you multiply a small number by itself, it gets even smaller. So I now want to solve this and I... There's lots of things.

I'm going to put a one under this because. Now, and I know that a lot of you have been taught to cross multiply. You don't want to do that for this situation because you don't know where's the less than. So I'm going to use the golden rule, which is whatever I do to one side, as long as I do the same thing to the other side, everything's good. But I'm also, can I figure out what this is right here?

Can I just work this out? I think that'll look a lot nicer. I'm going to clean up all the way along.

So clean up both sides. So point, so first I'm going to do the one minus 0.42, one minus 0.424. And I got for this part, I got 0.5.

but I'm going to hold it in my calculator and I'm now going to multiply it by 0.424. So that's what I got. I'm going to hold it 0.244224 over m is less than or equal to 0.0001 over 1. Okay, so I need to solve for n to figure this out. And I don't want to cross multiply because a lot of people don't even know what that is or where it comes from. It gets you into a lot of trouble.

I'm going to multiply both sides by n because the n's on the bottom and that is not a good thing. So it cancels here and I get 0.244. 2, 2, 4, which is still in my calculator, still there waiting for me, equals 0.001 times m.

So I wish that that 0.001 wasn't there. So to get rid of it, it's multiplying. So to kill a multiplying number, I don't want to do that. What you do is you just divide by 0.001 and that will cancel.

And whatever you do to one side, you have to do to the other side. Oh, it's threes, three zeros. And so I get N because those cancel and I just get my lovely black N on this side. Cancel, cancel, 0.00.

I'm going to make sure I don't drop a decimal. So I'm going to take it's still in my calculator. I'm going to now divide it by 0.0001.

And I get n is 2,442.24. So if I have exactly 2,000, so I 500. my standard deviation was a little more than 2%. The company didn't like that.

And I'm going to go back to them and say, if you want to have a fluctuation, a give or take, standard deviation is going to be instead of a typical value fluctuates 1% from the center, it's now going to fluctuate 2%. Sorry, 2%. Now it's going to fluctuate 1%. You're going to have to increase your sample size to 2,442.24. Only a scientist would say that because can you have 0.24 of a human being?

No. So the question is, you can't have a fraction of a person that would be a dead person. So the question is, is it going to be 0.2442 people or 2443 people? Which is it going to be?

Well, you could say, well, rounding, if I round this to a whole number, it's going to be this one, but you would be wrong because what this is saying is this is the threshold that hits exactly a fluctuation of 1%. If you make that go a little bit smaller, then the standard deviation will be a little bit bigger because standard deviation grows as your sample size shrinks. standard deviation shrinks as your grant sample size grows. So you can't round down, even though that's the closer whole number.

This has to be this. This is the number of people, the number of people that gets you the standard mediation you wanted. And the truth is, when you're trying to find sample size, if you get a decimal, you always round up.

So tip, when finding a sample size for a set standard deviation, you... always round up. Okay. And because if you round down, even though even maybe just the tiniest little bit, you've now increased your standard deviation. You don't want to do that.

Okay. So the steps for doing this, if you have this on your homework, which you will, you can look at this video again and then just replay the steps. The numbers will be slightly different. get rid of the square root sign, clean everything up, and then multiply both sides until, oh, I switched it to equal, didn't I?

So if I keep that, that tells you n has to be greater than that decimal, so it has to, you've got to round up. Well, never noticed that before. Okay, so let's look at what we did in this video. We did a lot of things, but it does actually boil down to a really nice succinct.

The message here is that if you are dealing with p-hats, the sampling distribution, the center will be the true p. The standard deviation, sigma, is given by this formula. Okay, and the shape is going to be normal as long as n is big enough.

And the threshold for big enough is that np and n1 minus people have to be greater than 10. So let's see if we covered all of this. As the sample size increases, the standard deviation decreases. So we spent a lot of time on that. For large samples, the sampling distribution can be assumed to be normal. And by large samples, we now know that's n p is greater than or equal to 10 and n1 minus p is greater than or equal to 10. So that's what large enough means.

Oops. Okay. And we just did a very messy look at how to do, how to find the required sample size.

It's a whole lot of algebra. Follow those steps at the end. There'll be one problem like that.

And determine if the normal approximation is valid. There, that's the check for valid. This is the check check.

And how do you find percentiles? You use the Dana Center tool, DCMP tool. And the sampling distribution tool is a bit of a pain.

It only gives you the lower tail, but the normal distribution will give you the upper tail. It'll give you the center part. It's a lot easier to deal with.

So you're hoping that your sample size is big enough and you can just go to the normal distribution. It'll make life a lot easier. Okay. So we're done. So take a little bit of a break.

And then the, this was kind of a long video. The good news is that the homework, I don't think is that long. All right.

Bye you guys.

Transcript for:Obesity Study and Sampling Distribution Insights

Transcript for:
Obesity Study and Sampling Distribution Insights