Transcript for:
Understanding Test Statistics in Research

Hi, welcome back. It's an exciting time in the year. We are starting chapter section 11b and we're going to supposedly learn something new, which is something very important called a test statistic.

But guess what? I have some fantastic news for you. We've basically learned all the new concepts we're going to learn and this point in the semester, we're kind of just going to be piecing together old ideas and maybe calling them new things. Like today, we've got an old idea.

We're going to call it a test statistic and you're going to, but so, so stay tuned. You're going to feel a lot of deja vu when I'm talking about test statistics and then you'll go, Oh, that's just wait for the punchline. So so A great thing about now is that I will be using the ideas from the past, slightly new context, but it'll be review. So connecting the dots and making connections. So let's go.

Share my screen, a little slow. Okay. So I think it was last time we were doing Flint, Michigan.

debacle on water supply to the city. And so you might be thinking, I better start drinking more bottled water. And that's not a bad idea. But at the same time, there's some concerns about the environment, about the planet, environmental.

It's bad to have all these plastic bottles. So let's see if bottled water is really that important. All right, bottled water companies spend a huge sums of money to market their products, touting their purity, rejuvenating minerals, and superior taste. And we are going to focus on, is it really superior taste?

At the same time, environmentalists are concerned about the impact of all those plastic bottles, which many of which are not recycled. So I see on campus, I see a lot of people carrying around those metal water bottles where you refill. I think that's just so awesome. But we're thinking about the bottled water today. Why does bottled water remain so popular?

Do consumers actually prefer the taste of bottled water or are there other factors at play? So that's the big question. do there's the question um do consumers actually prefer the taste of bottled water or are there other factors at play so um the question is how would you design a study to test whether consumers prefer the taste of bottled water or tap and that actually now that i i'm going to take it back um There's the question.

Test whether consumers prefer the taste of bottled water or tap. So we're not making a judgment. We're basically saying, is there a difference between the two? Design and draw a diagram of an experiment. Make sure to include four key components.

RRCC. randomization, replication, control, and compare. So you need to annotate. And at this point, I expect you to know this, and it will be on the final. Randomization, replication, compare, and control.

So those have to, so go ahead and pause the computer and Think of an experimental design. I'm going to give you mine afterwards, but yours will be a little bit different than mine. You won't need to write mine down, but you want to compare yours to mine to make sure you've got all the ingredients so that you will ace this question or this type of question on the midterm.

And I'm actually going to pause my camera too so that when you meet me again, I'm going to have my experimental design all written up and I'll go through it and then I will annotate it with our RCC. So make sure to do that. And I'm going to throw in so that we're pretty much going over.

Going over a midterm question, I would like you to also include placebo and placebo. And what that means might be different in different contexts. Okay. Okay, I have my diagram started.

It's not a traditional diagram like we've done in the past. And I'm doing this just kind of to show you that it doesn't have to be because I'm interested. My observational units are the water, comparing bottled water to tap water.

So my observational units are the cups of water. And I just got started. I'm not done. I'm going to go through. the RRCC to make sure that I've actually hit everything that my teacher is going to be grading me on.

So the first R is for randomization. So let's read this and see where we can put randomization and I haven't done any of the annotating. So make a hundred pairs of cups of water, each with one cup tap and one cup bottled water.

So I've got a hundred little pairs of cups of water. Stand outside Trader Joe's and ask the first 100 people willing to taste test the pears to identify the cup they prefer. So you might be thinking, oh, wait a minute, that's a volunteer sample.

And you're right. We live in a world where it's probably better that we don't force people into experiments. So there could be some bias in that the group of people who are responding might have a little more time. They're, you know, someone's going to ask me on a Friday night as I'm rushing to get home and make my kids dinner, you know, oh, would you like to do a taste test?

I'm going to say no. So, so there's always things to pick apart, but we want to see what's good about this experiment and we want to make it good. So where does the, and then count the number of people who preferred the bottled water. So that's my overall structure of how I'm going to It is an experiment because I am giving them, I'm imposing the treatment of their, I'm giving them the water and I'm watching, but I have to be careful because to be a really well-designed experiment, I have to have this RRCC.

So the first thing is, where does the randomization come in? Where can I put the randomization? People coming out of Trader Joe's, that's not very random. People agreeing. So there has to be a component of random.

So I'm going to put it right here. And I think you use randomization to try to mitigate bias. One bias is if people are coming out, maybe they're thirsty and maybe they'll always prefer the first cup over the second cup.

Or maybe there's a bias in the other direction. You don't want to allow that. So I'm going to say randomly assign.

quarter of which cup each person gets first. You almost certainly thought of something else, but I just, I'm like, Hey, I got that R done. So replication, where's the replication?

Well, the replication is not, a lot of you said on the midterm that replication was repeat the experiment, which is an awesome practice, but this replication is referring to good experimental design for this one experiment. And then we'll replicate that good experiment. The replication is you have a hundred. So I didn't say to explain, and I don't have a lot of room. So I'm going to say replication, replication.

You've got a hundred observational units that you are, and you don't, I mean, the, each unit is a pair of cups. You don't have 200. You have a hundred observational units. So that if there's any weird variation, maybe one cup, maybe a fly flew in it or something, or there's some kind of, some weird anomaly happened that made some cups not as good, but you've got a hundred and they'll end up in evenly spread out.

in the people who are trying it. So that's where my replication is. And then control. I need to talk about control.

so I need to be explicit about control. So where can I put it, exert some control? Well, you exert control, like maybe it'll be all women, maybe whatever are some hidden factors that might mess up how people feel about the water above and beyond the water itself. So it could be the packaging.

It could be the temperature. So I'm going to say for control, control by making sure cups are the same in size and shape and temperature is the same. People prefer colder, colder.

You can't identify, you know, any, it's harder to identify yuckiness. So I just got my control down and now there could be other control factor. I'm, but I'm paying homage. I'm letting my teacher know, I know what I'm supposed to be annotating. And then, oh, I don't have compare anywhere.

So I need to have compare. So I'm going to do, that's my fourth step. So if there's no preference and they're just wildly guessing, then what, if there's really no difference between the bottle and the tap, you would expect people to identify the bottle water only 50% of the time, 50% of the time you pick this and 50% of the time you pick the tap.

So the compare is going to be the 50%. That's if they're all, all things are equal. So compare the proportion of people who identified preference.

for bottled water to 0.5. This would be the proportion. You would see if there is no preference, because when you're doing an experiment, you always want to assume there's no difference.

And so by no difference, it's going to be 50, 50. Okay. So it doesn't look, I'm sure it doesn't look like your diagram. I'm sure. that you could have thought of other controls. I'm sure you could have thought of another way of setting up comparison.

This was just my experiment. And I noticed that I did not talk about placebo. So, and I did expressly just ask about placebo. So placebo is a fake treatment.

So I actually don't know. I mean, I guess the treatment is, I think. you know what, placebo would be a fake treatment. So giving someone a sugar pill, giving somebody a glutinous bread when the treatment is to remove gluten from the diet, I'm going to change this into blinding, which means they don't know which, which is which.

So the fact that the cups are identical, that's also blinding so that you can't tell. which you're not setting up. Like in my mind, I'm comparing the bottled water to tap water. So the treatment is getting the bottled water, but I'm going to make it so that nobody knows which is which.

So that is also, you blind, you introduce placebo as an effort to do control. So write that up somewhere. I don't have any room and I've talked about it so that you have a sense of what I'll expect. when you have a problem like this on the exam. So on the exam, I would hope that you would go with the traditional diagram.

I just wanted to show you how those variations there. Okay. So we're going to use this concept of this particular experiment to introduce you to something called a test statistic.

And a test statistic can be used to assess the strength of the evidence against the null hypothesis. So a test statistic is a standardization of the observation. And we'll get to that in a minute, but it's a way of just quickly assessing how unusual your observation is.

When working with proportions, if the sample size is large enough, the normal distribution with a mean of zero and a standard deviation of one can be the model of the test statistics. So we're going to have as long as n is big enough. We're going to use normal 0, 1 instead of drawing a really complicated sampling distribution because we're standardizing things.

And you'll see what I mean in this example. So by the end of this class, you'll be able to calculate a test statistic. Truth is, you already know how to calculate a test statistic. And you'll be able to interpret the value in context.

And it's really important that it's... in context. So by in context, you're going to make an interpretation about water bottled versus tap. You're not going to give me some generic answer. It's going to be very specific to what's going on in the experiment.

And you're going to use the test statistic to decide whether the null hypothesis about the population is a plausible explanation. of the sample results. So that's where we're headed. So let's go to it. So, and like I did, I promise you, this is actually no new information.

It's just connecting dots from different sections that you've already learned. So you conduct a taste, a test taste, whether consumers prefer the taste of bottled or tap. State the null, state the null hypothesis.

go back to blue, the null hypothesis, which is H naught and the alternate hypothesis, which is H A in words. So in words, we're trying, is there a difference between, so you can, there's a variation on how you can do this, but I'm going to say there. There's no difference in preference between tap and bottled water. Take away the fancy bottle, take away all the fanciness that wraps it all up.

People can't tell the difference. That's what I'm going to say. And then in this one, and I'm going to, oh, let's be clear for consumers. I'm giving this to human beings.

I'm not giving this to dogs. They'll just soak up both. They'll be happy with everything.

So for people, we're talking the people of interest are people, are the objects, the people, you know what I mean? Okay. And here, I'm not saying that if you look at this, the question is whether preference tastes bottle or tap.

So I'm not saying, is bottle better? I'm saying, is there a difference? So there is some sort of difference.

between preference of bottle compared to tap. So I guess I should pick one brand of bottled at a time because there could be, or maybe I'll just mix up a whole bunch, randomly mix up a whole bunch of bottled and just say bottled overall. But I think I'm going to think about. one brand of bottle because maybe it's different.

Okay. So notice that now in part B, I'm saying state the null and alternate hypothesis in terms of proportions of consumers who prefer bottled water. So if you're, so it is a categorical, I'm not saying on a scale of one to five, tell me how you feel about this. Tell me how you feel.

I'm saying, which is better. So that's a categorical response. That's a qualitative response. So you're going to be using proportions instead of averages. So let me translate that.

And I'm thinking, I'm going to say P, I'm going to define what P is. Let P equal the proportion of people who preferred. bottled.

Okay. You could have said it was a proportion of people who prefer tap. You can say, you can define what you want, but I'm saying this is that P is the true proportion. If I could get every person in the world to taste the difference, what proportion would prefer bottled? So if there is no difference and they're just blindly guessing, or it's some kind of weird emotional.

the non-valid response, and they just basically taste the same to everybody, then P is going to be 50% because it's 50-50 chance. If so here for HA, and it always helps to know that if you are dealing with not, if you're dealing with not, it has to be equality. And then once I have this one, the next one is easy.

It's going to, so it'll be blankety, blank, blank, blankety, blank, blank. But this blank right here is, this is identical. Whatever letter, it's going to be that letter. And this blank is identical. So all you need to do is decide, is the symbol in the middle less than, greater than, or not equal to.

Now, if you were suspecting that bottled is more preference, then you would have, so for the alternate, you would have this. You'd be saying, oh, I'm testing to see if bottled is preferred, but that's not what we set up here. What we set up here was design an experiment, make sure to include P4. Okay.

Do consumers actually prefer the taste of bottled water or the or somewhere else? Where is it? Oh, here.

Bottled water or tap water. Which one is the winner? So you don't know. So you're going to say just not equal. And that's a symbol for so no difference.

There's some difference. So it could be that people hate bottled water. I tend to hate the alkaloid. I can taste that.

All right. So there's our null and alternate hypotheses in words and in symbols. And as you probably noticed, the symbols are a lot less hard to write down. And that's why statisticians gravitate towards a symbolic null hypothesis. Okay.

So consider the following four data sets that might have results. from the taste test. Which of the two taste tests that follow provide more evidence of preference for bottled water or tap water?

So we want stronger evidence. Which one is going to give you stronger evidence? More evidence of a preference, of a difference.

So, and the difference could go either way. So in the, so from here, which one, which result, the results from test A or test B will give you a stronger support of, oh, wow, there's a difference. So pause and see what you think. And then I'm going to answer the question. Okay.

So for this one, there's a clear and definitive answer. If I look here, my P here, and it's actually a p-hat, it's coming from the sample. My p-hat here is 17 over 30. which translates to, oh my gosh, I didn't work it out.

Oh, I don't really need to. That's why here my p-hat is 21 over 30. So same sample size. The sample size is 30 in each one. And here we have a greater difference. If it were identical, you would think 15 out of 30, 50-50, right?

So this one right here is further from the center. The center would be 15 out of 30. So this is a more extreme result. And so it is so more evidence. This one is going to make me think, oh, it's probably not just a natural fluctuation. 17 out of 30, that's pretty close to 15 out of 30, which is dead even.

They both equally popular. 21 out of 30, that's going to be a little bit harder to explain the gap between the hypothesized center. of 50% compared to something else. Okay. So that one's pretty, I don't think that one, I bet everybody had no trouble with that one.

So the next one, which of the two taste tests that follow provide more evidence of a preference for bottled or tap water could go either way. So pause this one and make up your mind. Okay. So I think for this one, um if we look at it so if we look at it our results again the revolt the results for the top one p hat equals uh 22 preferred the bottle out of 50 okay i would out of 50 i'd expect maybe 25 so this is a little bit that bottled water they didn't like it as well but only by three off by three people that's not really far off, right? It's not far off from 50%, but the other p-hat, this p-hat is 44 out of 100. And if you, oh, it's, oh, they have a different sample size.

They have sample size of 100 instead of 50. Oh, let's do this. That's going to be 44%. And oh, this one's going to be 44%.

That's a same proportion, but same proportion, but different distribution. And I think I'll make this distribution a very different color. This proportion.

comes from a sample size of 100 versus this proportion comes from a sample size of 50. So I'm going to draw, if I think about it, if H naught is true, that there's no difference, then P equals 0.5 is the true parameter. And if I draw this distribution, let's just, there it is right there. That's that distribution, 0.5. It has a standard deviation, some sort of, I don't know what that standard deviation is, but it has some sort of a give or take.

We could work it out if we wanted, but I don't want to right now. That is this distribution. This distribution down below. has a sample size of 100. So is the variability going to be smaller or bigger?

If you're looking at a hundred samples at a time, a hundred trials at a time within each sample, are you going to get less variation or more variation? You're going to get less variation. Your sigma is going to be smaller. So maybe this is my, this is a sigma. So I'm going to get a taller, skinnier distribution on the same scale.

So I'm sorry, that was so terrible. But and then our observation, P hat, P hat, 44% is maybe in the same spot on the horizontal axis, but that is way, way more fringy. The area associated with that for the smaller sample size is way different than the area associated with that for the big, for the sample that has, sorry, maybe I said that wrong.

The, for the blue distribution, it's relatively further from the center. Gosh, we're having to do a lot of thought. I wonder if there was some handy thing. What if we did a Z-score? If we did a Z-score.

for this observation of this p-hat, it would be very different for the blue distribution because there's less variation and it would show that there is it would be more unusual. So I'm going to go on a limb and say because the sample size this p-hat is more unusual because the sample size n is so much larger leading to a smaller variation. sigma, smaller spread of the data in that distribution.

So it's, you know, it, you definitely can compare, it's, it's not that hard to compare the same result. on two different distributions, just pick the distribution with a smaller variation, it's more unusual. Similarly, you can compare different results that live on the same distribution. Here the situation is we've got one distribution and here's 0.5 and the first p-hat right here, this p-hat is maybe right here, and this p-hat, this p-hat is over here, you can see, oh that one's further away, it's more unusual. The further away you are from the center, the more unusual.

So that's an easy, that's the easiest scenario. The second easiest scenario is to compare the same result. on two different distributions but now what do we do here um 30 people participate 21 prefer the bottled water okay so i'll pick a pretty blue for that one so i'm just gonna p hat equals 21 out of 30 and the other one oh is that because we already saw that one.

Okay. And then the second one is, p hat is 44 out of 100. Uh-oh, that's 44%. This one, I don't know what 21 divided by 30 is. 70%. So this distribution, that p-hat is clear, the top p-hat is clearly bigger, but it's going to have more variation, less variation, more variation in the data. So it's bigger, it's further away, but the distribution might be more spread out.

And this one right here, the observation is closer absolutely to the center, but also we know that there's less variation in that data. So it's really hard to, all of a sudden I don't have any common ground to compare. This one is tricky. And this is what usually happens.

Tricky. Tricky. We're comparing different proportions, different values of proportions on different distributions.

So what could we do? Tricky to do just logic it out the way we did because too many differences. at play. It's kind of like when I was asking you to compare Delia's height for my daughter, her height in women compared to my son's height in men.

What did we do? What did we do when we had very different results from very different distributions? Remember, we use z-scores. Maybe better. A good idea.

to compute z-scores so that you can see how far away the observations are relative to their distributions, or use normal DCM tool and p-tool to compare areas. equal probability. So calculate the area of the tail, and that's the proportion that will be that extreme or more unusual.

So you can do either one of those, and the result that has a small area is going to be the more unusual. But today, we're going to be doing this approach. we're going to be fine. And that's all, that's what a test statistic is. A test statistic is a Z-score in this context.

But in general, a test statistic is a standardized score. So we're asking you to compare test B to test D. And so it is a good idea to visualize the distribution. And um Now we're using standard error for spread instead of standard deviation.

And that's because we don't really know what the true center of the distribution is. So we're just, we're assuming it's 50% and we're using that to create our standard deviation. So if you recall, the formula for standard deviation is P1 minus P over N.

But we're not going to, we don't know p, we're going to use p naught, which is p naught, I go all the way back up here, p naught. This is, we're hypothesizing, h naught says that p is 50%. So that's why they call it standard error instead of standard deviation.

And a standard error for p hat in this situation. is 9% and standard error for the other one, when we have a sample size of 100, is a lot smaller. Just it's not surprising. So you can get that result by throwing it in those formulas.

I'm not going to do that right now, but the centers are the same and if we were to draw this distribution on top of the other one. it would look something like this. So its hump is 5%. So it would be like, it would be a lot tall. It's so hard for me to draw this, but it would be, it would like, it would be a lot taller and skinnier because it has less spread, literally.

Its fluctuation is about, so we would expect results for this one to be about 50%, give or take 5%. So in this one, we would expect the standard deviation, the center to be about 50%, give or take 9%. So it's more spread out.

So same thing. So if we're visualizing that. two different distributions a really good um so we did that um so we're gonna now go and do we're gonna do the test statistic which is just a glorified term for z score so if uh i'm gonna pause for just one minute and i'll be back in a second and we'll work with the test statistic Okay, so we're going to use test statistics, but it's really, let's look at this. This is the generic term for a test statistic.

And so it's actually going to be more than just a Z-score. So a test statistic is always, well, for this class, the test statistic is going to be observation, sample statistic. What are we doing for a sample statistic?

We're doing p-hat minus the null hypothesis value. Well, that right there is the center of your distribution if the null hypothesis is true. So center.

of distribution, sampling distribution, if H naught is correct. Okay. And then down here, standard error, that is the spread.

of that same distribution. And it's a sampling distribution. So this is our observation.

So I really hope that you are getting deja vu all over the place because that's... That's what should be happening. So in the context of what we're doing right now, it's not always going to be the case, but it's going to be p hat minus, and I'm going to say p, but I'm going to put a little not on it. So it used to be our, if we knew what the true proportion was, but now it's whatever h not claims to be. because the established truth gets to say what the true center is until we can show that that might not be reasonable.

That's what happened in Flint. All over and then down here this is going to be the spread of the sample distribution is going to be standard error which will be p 1 minus p over n But wait a minute, it's, this is to be standard error. You've got to acknowledge that this is an estimate. So this is if P naught is, if H naught is true.

So I highly recommend that if some of you are going to have problems with this. So my tip is tip workout. standard error first on your calculator. Keep that decimal somewhere in your calculator. You can round to four places past or you can keep it where it is and then go ahead and work out the rest of it.

Otherwise, then calculate. the test statistic and it's what is it it's z score this is a z score it's what we've what we've been looking at all along so test statistic in this context is a z score it's not always going to be that but it's going to be very close. But so no new information.

So, so my recommendation is first, find the test statistic. I mean, sorry, first find the proportions, and then plug it into that formula. And the only thing that's going to stay the same in both of these is the p naught is 0.5 but the standard error will be different and the because sample sizes will be different. So I would like you to pause the camera and calculate your p hats.

I think we already did that but if you didn't do it again. So I got so this is going to be 44 out of 100. I'm hoping so here p hat equals 0.44 and this one is 21 out of 30 so p hat equals 0.7 so we got that already and then if you want to shove that all in it's going to be for the test statistic and i'm gonna Let's do this test statistical grade. Z equals, so observation minus center, hypothesized value over spread. I'm just going to cheat.

Copy this image. Okay. And good.

So 0.7 versus 0.444. And it's a good idea. The center is going to stay the same. We believe there's no difference.

So there's a 50-50 chance. And then the standard deviations are going to be different. Or I should call them standard errors because they're approximated. It's a really good idea to write your formula down. 0.5 times 1 minus 0.5 is also 0.5.

All over. And in this case, n. is, let's do this one, 0.5, it doesn't change, but in this case, n is 30, and in this case, n is 100. So when you work that out, I think it's a good idea, especially if you come from a background of, you haven't had algebra.

and a million years, which is totally okay, then I would really work this out to four places past the decimal, and then go ahead and do your calculation. So do them kind of separately. So up at the top, that's going to be 0.2. And the bottom, if you run that through your calculator, It's hard to see.

I have 0.0913. Is that right? I'll figure it out.

0.5 times 0.5. Enter. Divided by 30. Enter. And then square root it. Yes, it was.

I was correct. 0.091. three okay and then this one if you run the next one through and you divide by 100 instead i hope you pause and go oh yeah it makes total sense that that is going to be a smaller value on the bottom and sure enough it is 0.05 is what i got um and then on the top it's going to be negative since you're subtracting this one smaller You're going to end up with a negative number. If you have $44 and you spend $50, you owe somebody something.

And I just did whole numbers, but it's negative 0.06. And so I would do that. I would do the multi-step process.

And so this ends up being the top Z score for the is 2.19. eight. And the bottom z-score is negative 1.2.

So those are my two different z-scores, which means those are my two different test statistics. interpret in context the test statistic for test B and test D. So in context means you better be talking about water. So test B, so go ahead and pause that and write what you think.

And just better be a sentence with bottled water and tap water in there. So for test B, the proportion of people who preferred, because it's positive, bottled water. compared to tap was more than two and I said standard deviation standard error I don't really care uh two standard Errors from above the hypothesized center, above the null center, above the center, if null. pipe is correct.

Okay, similarly for test C, the proportion, it's all about the proportion of people who preferred the bottled water, of people who preferred... bottled water was more than um was it's uh let me was just over one standard error below the center if H naught is correct. So if it's just barely one below, that means it's just barely outside the hump. So if you want to draw You don't have to draw this, but there's 0.5.

It's like right here. That's where that peahat is. Whereas this one, it's a different distribution. And it's actually a taller, skinnier distribution.

Oh no, it's more spread out distribution because it has. only a sample size, so it's more spread out. But I just want to get the idea that it's, if that is my distribution, it's not one, it's more than two.

It's not one, it's not two, it's a little bit more than two. So if I standardize it, if I just look at the z-scores, this one right here is further away from the center, further away from the assumed truth if the null hypothesis is true. So this is in words and this is in pictures what's going on.

Okay, write a sentence to describe how the test statistic and evidence against the null hypothesis are related to each other. So make sure I... I hit everything I wanted to and B, no, I did. If you want to throw out the established truth, like they wanted to throw out in Flint, Michigan, they wanted to reveal that the government officials were taking kickbacks, were giving awful water. You want your data to be as far away from the center as possible.

Now in Flint, they wanted to show that a much higher proportion. In this case, either end disproves, doesn't disprove, but indicates unlikely that 50% is true. So in general, as a test statistic gets further from the center, well, what is the center of a Z?

Where do Z scores live? Because we didn't even... We haven't thought about where they are. This is just measuring.

This is just measuring how far from the center the observations are from their center. Well, z-scores live on normal, the normal standard normal distribution. So you probably don't remember that.

I'm going to write it down here. Z-scores, they're called standard scores, live on. the standard normal distribution. And by standard, we mean the center of this distribution is zero and the spread is one. It's so much easier than trying to calculate the, it's just easy math.

The further the z is from its center, which is zero, from zero, the more unusual the data is, assuming h naught is true. Therefore, the further z is from zero, the stronger the evidence is. against H naught. And that's the connection. The further Z is from zero, the stronger the evidence against H naught. And the benefit of that is you don't have to bother to visualize distributions like this.

You don't have to do any of that. You can just give me the Z score up. Z equals three, forget it. That's not going to happen. oh, Z is 0.5.

Oh, I don't, you don't have to imagine the sampling distribution. You don't have to imagine the observation within that sampling distribution. You are, you just automatically know how far from its center it is. So that is why we love Z scores because they're quick and easy, relatively quick and easy to compute. And you don't have to keep thinking about the actual.

original sampling distributions, which are all different, all the spreads are different, the centers might be different. You just standardize it, boom, you've got the results. So, but we have to make sure this only happens if we actually have a normal distribution to deal with. So assuming that the sample size is large enough, we can use a normal distribution to model the values of p hats, okay, we already know that, that occur if the null hypothesis is true.

So i.e. you can use the DCMP tool for normal distribution instead of binomial, which it's just such a pain. And in class activity 9c, you learn that sample size is considered large enough. If this is your check, check.

Um, so since the normal distribution is being used, um, to model the values, if the null hypothesis is true, we can check the condition replacing P with the value of the null hypothesis. So in this case with 0.5, okay, I think we knew that. So, uh, should we do that? Am I asking you to do that?

Um, I want you to do that. So go ahead and just double check that all of this, you can't do a z-score unless he, since they live on the normal distribution, you have to make sure that your sample size is big enough. So go ahead and check, check.

So n p greater than or equal to 10 question mark and one minus p greater than or equal to 10. So if you know the true proportion, you just go ahead and use it. If you're dealing with confidence intervals, you substitute it in substitute in P hat because you don't have P. And if you are doing the null hypothesis, you substitute in P naught because that's what the established truth claims is the true proportion. So in our P naught is 0.5. So N, oh, we've got two different distributions going on, we had for test B, we have a sample size of 30. So for test B, N is 30. Our P naught is still 0.5.

And so half of that is 15. Yay. That is greater than or equal to 10. So that's a check. And then we still have n. 1 minus p, it's the same thing.

So since 1 minus 50% is still 50%, so 15. So we only, so that's a check. So a sample size of 30 works. Well, if a sample size of 30 works, what do you think a sample size of 100 is going to work? So the other test. has a sample size of 100 but I'll just go through it who knows n p naught is going to be 100 times 0.5 well that's 50 and sim which is clearly greater than or equal to 10 and if you just want to beat a dead horse you can Is that greater than or equal to 10?

Yes, it's going to be 50 is greater than or equal to 10. Check. So both, so this was for test D. So it is valid to assume normal distribution. So you can go ahead and use that beautiful test statistic.

You can compare those two test statistics, those two Z scores. So the following graph is a null distribution. um for sample size of 100 so that was um come on i believe that we were calling that test yes This is test D. Okay, so it has a center of 50 and the spread, this distance right here, is going to be the square root of 50 times 1 minus 50 over 100 and that ends up equaling 0.5.

And as you can see, you take 5% away from 50, you get 45, add 5%, and you get 55. So it's give or take 5%. So I'm going to put a little p-hat here because these are our sample proportions when n equals 100. And so this is what we would expect our p-hats to be if we sampled, you know, a million billion p-hats all of size 100. So this is our sampling distribution. sampling distribution of p hat, how much it would fluctuate if H naught is true for M equals 100. So we'd expect to get each result, each p hat to be about 50%.

give or take five percent. There's zero in there. There we go.

My bad. We give or take five percent. Those decimals are so pesky. So that's what the distribution looks like.

And identify the value of the test statistic for each numerical value marked on the x-axis. So I'm going to use a different... color coding here, write the values below the x-axis. So what's the formula for test statistic?

What about test statistic? Oh, color, test statistic. Well, the formula for test statistic, and I'm going to write it.

I'm going to write it right here. Z equals observation minus center over spread. And the observations are going to change, but the center of this distribution is 50% if H naught is true. And the spread, if H naught is true, the spread... is 0.05.

So that's our formula. And so if I, so the first question, they want you to do it for all of them, but it's interesting if the sample proportion is equal to the null hypothesis. So this is, if that's our p hat.

really pretty color for this. If our p-hat is 0.5, so we'll put 0.5 in there, the test statistic value would be, well, it's going to be z equals 0.5 minus 0.5. all over 0.5, 0.5. Well, that's just going to be zero. Oops.

So if we plug in a p hat of 0.5, our z is zero. So that's what it works out to be. They asked you to do the other ones as well. So 0.55. So here, if p-hat equals 0.55, if this is p-hat, well, then I'm going to plug 0.55 in there.

So my z-score is going to be 4.55. I'm going to plug 0.55 in. Observation 0.55. Everything else is constant.

It doesn't change. Minus 0.5 all over 0.05. If you work that out, you're going to get that the whole thing is equal to one.

So if you work that out carefully, you get, and I'm going to put it up here, z equals one. And they actually ask you to do the others as well. The directions here is a little confusing, but identify the value of the test statistic for each numerical value marked on the x-axis. So We just did this one.

We just did that one. And it's right here. And then we did this one.

And it's right here. I would like you now to do the rest of them. So do it for this one. Do it for this one.

do it for this one for this one and do it for this one um so pause the camera and do that and i'm sure a lot of you already see the pattern here but if you count if you so you're going to plug it into this formula so you're going to plug in 0.35 then you're going to plug in 0.4 And so on. And you're going to come back and you're going to see, I'm going to share my results with you and I'm not going to have to even work it out. I don't even have to actually pause the camera. I can see that if the observation is 0.35, my Z is going to be negative three.

If it's 0.4, if my P hat equals 0.4. I can see that's not one, but that's two standard deviations below. So here's Z equals negative two.

And here, Z equals negative one. Here, Z equals positive two. And here's Z equals positive three.

So if you actually plugged it into this formula, you would see that pattern. And it's just... fortifying the fact that z-score you don't have to do all the work you don't have to visualize the sampling distribution just calculate those z-scores and you know exactly where your particular observation is within your particular sampling distribution and it's less work just love the z-score love the z-score and it will tell you all that information okay so number six we're almost done So number six, I think we're switching gears and now we're just going to use the beauty of the Z-score. So take a break for just a second, clear your mind.

And then let's, if this all works out, you got it. You got this. Okay. So.

and actually try to work this out on your own. Try to work this out on your own from scratch and then come back. Okay, welcome back.

The taste test was conducted on a group of statistics students in Florida. Out of 22 students who participated, 20 preferred the taste of the bottled water. That's almost everybody.

before you calculate anything, make predictions about the z-score, the z-test statistic. Do you think that the value will be positive or negative? Do you think the value will be far from zero or close to zero? So the null hypothesis hasn't changed. The null hypothesis is saying we believe there's no difference.

So the proportion of people who really prefer bottled water should be 50%. That is P naught. That's what the hypothesis says. And then we get this observation, P hat is 20 out of 22. Now, so for out of 22 people, if it's 50%, 50% times 22 is 11. You'd expect to see 11 people saying, and we got 20. Oh my gosh, this is so much. I'm not doing any calculations.

I'm trying to follow the rules. This is much higher, much bigger than 11 out of 22, which is what you would expect if 50% is the truth. So A, is it positive or negative?

This is much bigger. So it's going to be positive. It's going to be above the 50% because it's bigger. So do you think the value is far or close to zero? I don't know that I haven't calculated the standard deviation, but I, that to me, it's almost a hundred percent when I was expecting 50%.

So I'm going to say that I think it's far, the value is far. And I didn't ask you to write your reasoning down, but you can, if you want positive, because p-hat the real p-hat is bigger than the center and far because almost 100 percent of people preferred um preferred bottled so we had volunteer students what are the it's possible that the two students who either couldn't tell and guessed maybe they're both smokers and they have no taste so hopefully we randomly gave the cups to them in different orders it's pretty but who knows but only two people disagreed with that, that's pretty strong. I think it's going to be pretty strong in those.

I'm not sure though. When the sample size is 22, the standard error is almost 11%. So the standard error, and they did that by throwing it into that formula.

They calculated, they went ahead, if you want to know, standard error is going to be P one minus P over N. But be careful. Are you going to, what kind of P are you putting in there?

You're putting this one in. You're putting this one in. Because you always put what the hypothesis says is true.

And N was 22. And if you work that out. If you're having any kind of calculator issues, go ahead and work that out. But that's where they got that.

Why is the standard error large, larger than the standard error in test B and test D? What's the factor that messes it up? What makes it large is this right here.

What makes it large is this right here. oh that's almost impossible to see now I blocked it out, the sample size of 22 is smaller than was it 30 and 100. I think it was 30. Was it 30? I'm like, yeah, this thing is 30. Move on.

We're almost done. Calculate the test statistic for the sample. Okay, so notice I'm not even, I'm not doing this.

I'm not going to do it. I did just figure out the standard error. So that's good.

I'm going to use that piece of information in a minute. So I know in general that the test statistic for this chapter, Z. equals, and at z of p hat, whatever our observation is, is the observation minus the center, the hypothesized center over the hypothesized spread. And while I'm not drawing this, I am thinking about it.

There's me thinking. Okay, so I know that my center is that the true proportion is 50%. And I just up here calculated the SE of 0.107.

And I know that my observation, well, it's p hat. equals 20 out of 22. I should figure out what that is. I'm surprised I didn't ask you to work that out, but I guess I didn't. 0.909.

So there are a lot of bits and pieces, a lot of moving parts, but you're going to see that I'm going to be asking. pretty much the same things over and over and over again. So 20 divided by 22, I'm just checking.

Yeah. And so now that I have it like this, it's going to be easier to calculate. So just remember, for those of you who haven't had math in a million years, you hit enter after, after figuring that out, and then you hit divide.

And then you divide by the green. So make sure if you walk into the next exam without being comfortable with these order of operations, it's going to be devastating. So 3.823.

That's what I got. And so part D, so that's my Z score. And I'm like, oh, I love this.

It's not hard to calculate. Is it reasonable? Oh, wait a minute. Did we, oh, we have a smaller sample size.

We have this sample size right here. Uh-oh, is that sample size big enough to assume that our Z-score actually lives on the standard normal distribution? Because it all falls apart.

It's the central limit theorem if we don't meet that threshold. So. How can we, is it reasonable to use, is it reasonable to assume the normal distribution?

Well, we've got to do our check, check. NP and N1 minus P. And since we're dealing with hypotheses, we're going to use P naught.

So now we have N is 22. And p naught is 50%. And so that's 11. Phew, we just made it. That is greater than or equal to 10. Yes. And I know that if p is 1 is 50%, the nice thing is that n doesn't change.

And 1 minus 50% is still 50%. 50%. And so it's 11 again.

So yay, we did our check check. So is it reasonable? Stample size is big enough. So.

As long as the other conditions, samples were random and all the good things about collecting data were applied. I don't know these Florida students. Maybe they're not as good as you guys, but I'm going to say yes.

So the conditions were. samples are random your sample size isn't too big you never have to worry about that one your sample size isn't too small and we did just show that it's big enough it's not too small so i'm pretty happy that we can we don't have to do the binomial yay um is the null hypothesis no preference in bottle a plausible explanation so given that your observation has this z score is are you going to keep h naught are you going to say it's the truth are you going to say this is the picture of reality go ahead and remove oops too big i'm not going to cross this out yet is that a good picture of reality so the big idea here is we assume this to be true. We assume H naught to be true. There's no difference in how you feel about bottled water.

Then we go and we gather our data and we see how well the data fits inside the picture of reality. And what we have found out is according to the Z score, That observation, which is real data, is almost four standard deviations from the center. Our observation, which is a p-hat, is almost four.

standard errors, standard deviations from the center if H naught is true. Well, we assumed H naught was true. That was our only assumption.

We know our data is good. That data doesn't fit in that reality. So I'm going to give this a big no. So no, the null hypothesis is not plausible.

The null hypothesis. does not appear to be plausible. Because, so I've got to explain, because The data, which we know is real, is too unusual in this version of reality.

I'm going to put the reality in quotes. So no, big idea in hypothesis testing, you assume H naught is true. You go and you gather your data. And if the data is close to the center, oh yeah, that's not surprising. Natural variation of data.

But if the data is far out in the fringes, Depending on if here it was, whether it's big or small, all good. So clearly state your conclusion for this test. Now, I did not say in context, but let's assume in context.

Is there evidence? So the conclusion is always the template. I think I already went over this there.

And you're going to pick is. or is not enough evidence to support the idea that, and then This right here is HA, whatever HA says in words. So is there enough evidence to throw out H naught? You bet there is. So there is, I'll pick this one. There is enough evidence to support the idea that, and then you, what is HA?

Well, if you bothered to do it in words, which I think we did. All that time ago, all that time ago, knew there is some sort of difference between the preference in bottles compared to tap. That there is some sort of difference. in preference between bottled water and sorry tap water.

Now a lot of you are probably scratching your head and going wait a minute why can't you just say there is evidence to support that bottled water is better. That wasn't your alternate hypothesis. You, if you had set that up, then you would have only been looking, we were going to accept values that were either way below 50% or way above 50%. So you can only put HA in, even though like you could go back and do another experiment and set up the alternate hypothesis being bottled water is better, but that's not what you chose to do. So last question, do you think it is safe to generalize these results to other parts of the country?

Explain. So can you write your explanation? Okay.

So some of you might be thinking, well, this was a really good experiment and the results are really strong. So yes, I can extend this to other parts of the country, but no, you can't because this result came very specifically from students in Florida. What we, what I suspect is that Florida's drinking water is really awful. You know, a place that has really lovely drinking water.

It's fair. It's not Santa Barbara. I think our water tastes kind of bleachy, but Oakland, California, where I used to live has. primo drinking water.

It's lovely. And I, I don't know if that's because they had a very progressive governor, um, or mayor, no governor of California, sorry, mayor. So, um, I don't know why, um, but it's excellent there. So the results will vary. Uh, so our population of interest in this case was water.

Um, water and how people feel about the water in Florida. So no, these results are only valid in Florida. In other parts of the country, they may have better drinking tap water. And you can't tell the difference between bottled and that. I doubt there's a place where the bottled is not as good, but who knows?

I don't know. What do I know? All right. So let's go back.

And so what's the test statistic? Love our test statistic. Right now, our test statistic is our z score. Z equals, ooh, is that too light? Z equals observation minus hypothesized center, which is going to be a p-naught over hypothesized spread, which is the SE, look up the formula, it's terrible, but it's not that bad.

So our Z score is a standard score, and it is our test statistic. And the further away evidence against the null, further away it is from zero, the more evidence against the null. When working with proportions, the sample size large enough, So you've got to make sure you've got to do your check, check. And your Z scores live on zero, on normal zero one. The center is zero.

The spread is one. And I hope by now you can calculate the test statistic. Well, you could do it before. You just didn't know it was a Z score and you can interpret in context. So.

Your z-score tells you how many standard errors or standard deviations, depending on your perspective, how many steps away your observation is from its hypothesized center, and you better say what that is. Below means it's smaller values, above means it's bigger values, and how far from zero. Either direction tells you evidence against the null.

And the test statistic, use the test statistic to decide, um, to decide when the null hypothesis, um, is a plausible explanation for the sample. So the closer it is to zero, um, the more it's like, yeah, it's probably the null hypothesis is true. Okay.

So we're done. No new ideas, just a lot of review and, uh, connecting dots. So please, um, take a little break and then go do your homework.

practice exam practice tests practice assignments