Hello, welcome to In-Class Activity 11a. This is an introduction to hypothesis testing, which is actually one of my most favorite parts of statistics. It is a way that you can approach old ideas, and if you want to try to replace them with new competing ideas, this is an awesome way to do it.
And it's a very established statistical tool or techniques that are very powerful for trying to put forward new ideas. So let's go to it. I'm going to be using a case study, Flint water scandal, actually.
And you can Google it and read all about it. And it's a perfect example of how the little man, the little woman, the little people. can establish a giant, the establishment, the city that has a lot of corruption behind them, and win using statistics. So let's get to it. All right.
So I'm going to share my screen. There we go. Good. Okay.
So here we go. We've seen we can make some inferences from populations looking just at samples. So we're going to be looking at a sample and, you know, whenever I look at data, I'm going to use orange.
So we're going to be using data to try to understand a population. And so the population, the stat, the actual truth we never really know, but people make up a lot of ideas about what it is. So what happened in, so hypothesis testing, this is our new thing. Hypothesis testing can be used to explore whether there has been a change in a particular population parameter or whether the parameter actually was different all along than assumed. So that's what we're doing is this new technique called hypothesis testing, which I love.
Okay, so here's the example. In 2014, residents of Flint, Michigan began to suspect that their water was contaminated. And it says with lead. I don't know if they knew right off that it was lead, but it was changing colors.
It was brown. It tasted funny. Um, they turned to their city, um, and I'm going to use. the color gray, because what's going to happen here is there's going to be a lot of anachronisms, I think is what they're called.
So here we go. Here's one, DEQ. And people choose great titles, even for terrible organizations.
So this is called the Department of Environmental Quality, employed by the city of Michigan, not the people of Michigan. And they conducted an investigation and found that they this group said oh we're compliant our city is compliant with federal water safety regulations and that is it requires that 10 percent or less of houses show evidence and it's 15 parts per billion of lead but that's evidence of contaminated water. The residents, so I'm going to use a different color for them, we'll use blue because they're looking at the water, the residents of the city were not convinced and they took their own samples as part of the Flint water study. So the people, so I'm going to refer to the people of Flint as being different than the city.
The city of Flint This is the establishment. Establishment. So the establishment, which is the city of Flint, says that our water is fine.
There's nothing wrong with our water. And what they're actually, um, our water is fine. And it meets the federal guidelines of less than 10% of houses have contaminated water.
And the people of Flint are going to, and we're going to refer, it's the people of Flint, it's going to show up. FS, FWS, Flint Water Study, they're going to be saying no, I don't think so. Okay, so before we get started, I want you to think about this. What water samples, what water sample is contaminated if it contains less than 15 parts per, oh sorry, a water? So this is the definition.
of what contaminated is. You don't really need to hold on to the 15 parts per billion, it's just deemed as contaminated. So that's another number that shows up throughout this case study, but you don't actually need to hold on to that number. The number you want to hold on to is this 10% right here.
10% and it's saying less than 10% of houses have water levels this high or higher. um less than 10 percent have this terrible um have high water levels of lead. What percentage of residents in the Flint water study the people study sample returning contaminated waters would convince you that the actual proportion for all residents with contaminated water is a above 10%. So I just want, just think for a moment of what are some of the results that would be, you go, oh, maybe the gray established truth is not actually accurate. So write down what you think it would be.
So these are observations, I would say, certainly not 10% is kind of the line in the sand. And so what would convince me would be numbers higher than 10%. So you could have said 15% or 20% or 25%. All of these, any of these, all of these might make me think.
That's me thinking. water may be worse than water quality might be worse than city government says place Might be. I don't know.
Certainly not. These, if we had ideas over here, if we got a sample of 9% or 7% or 6%, that would support the idea that, oh, this city is doing really well. Any of those would be helpful to support the city.
Any of these would be helpful. to support that maybe the people, the blue people, the people of the residents are actually correct. Okay, you will understand, so after we're done, you'll understand low probability events can be considered as evidence specifically against the null hypothesis.
So low probability make you think the null hypothesis is false in support of the alternate hypothesis. You hold up, oh, my new idea seems better. And what you'll be able to do is identify the context, in context, whether sample statistics would serve as evidence against the null hypothesis and construct null and alternate hypothesis for hypothesis testing.
So that's where we're going. Okay, here it goes. So... The established truth gets to say what the truth is.
The establishment says what the truth is. So one truth for our society is that everybody is treated equally and fairly in a court of law. That's the established truth. It's up to organizations that challenge that truth to find overwhelming evidence to say it's not true.
Say the Black Lives Matter movement. revealed some of that strong evidence such as the film footage of how Black Lives Matter people were treated in protest. But I'm getting off the subject. The onus, it's with the onus of the people of Flint.
So these are the people of Flint, the Flint Water Study, to convince the U.S. Environmental Protection Agency, who at the time was the Trump... So it was up to the people of Flint to convince the federal government and other officials that the city of Flint was actually not in compliant.
In other words, it was their responsibility, the people of Flint, to provide convincing evidence that the current assumption that. Sorry. The current assumption that under, so the current assumption, this is what the city says, under 10% of residents in the city have contaminated water. It's up to the blue people to convince the federal government that that gray claim is false.
That's what they're hoping to do. What is the population parameter of interest in this situation? So When you're answering questions like this, it's important that you are really clear about, especially moving forward in the semester, population parameters. For population parameters, you're going to have two choices.
You're going to have P, population proportion, or you're going to have mu, population average, or mu. So when I ask you what population proportion, what population parameter of interest is in this situation, you're going to need to pick. one of these and then you're going to need to carefully define it. And when you're defining it, you're going to want to know, so pick from these two, and then you're going to want to describe, making sure to mention observational units.
and population of interest. What are the observational units? Who are the loose Flint going to actually, but we're talking about the onus is on the Flint people to talk about the water not being. not being compliant.
And I'm going to give you a hint. They're going to be going from house to house to house measuring water. So the observational units I would recommend are houses. It's actually the water in the houses. So go ahead and pause and try and do this to the best of your ability.
Is it proportion? Is it average? Is it how proportion of what? Okay, so I'm going to say, and there's, you could have variations on this, but so this is going to be, I'm going to pick purple here, the true population proportion of houses in Flint, Michigan that contaminated water and it helps to know that that 10 percent that they're talking about here this 10 right here homes yield water samples with lead less than blah blah so that's that's kind of what i was looking at when i did that i was what is what is the what are the government officials talking about they're talking about water samples coming from houses so i so you want to identify that it's you got a mentioned proportion somewhere you have to talk about contaminated water and you better mention Flint because this is not a study about the whole world. So there's variation on how you can answer this question, but if you don't have proportion and you don't have contaminated water and you don't have Flint, Michigan mentioned or Flint, then you missed a part of it.
So that's a parameter we're interested in. The compliance threshold for water safety is 10%. So that's 10%.
So if the government officials of Flint hit 10%, they're good. Let's suppose worst case scenario that Flint met the threshold. Considering that situation as a hypothesis test, what's the null hypothesis?
So I did a lot of words up there, but when I do null hypothesis, where is that? So null hypothesis. Null hypothesis, H naught, is always P or mu equals a number, a number. And that number is usually what the established truth says to be true.
So in this situation, I'm going to be doing P. Equal is always there. So it's always going to be H naught equals something. equals something.
And I just need to pick, is it P or mu? It's P. It's this one. P equals, and what does the establishment claim is the true proportion? It's going to be 10%. Now, actually, they said less than or equal to, but so you want to disprove them by saying, I'll just pick the worst case scenario and go from there because if it's less than, we're really happy.
So that's that. And so this is always the format of the null hypothesis. The residents who led the charge in the Flint water study, the good people of Flint, were convinced that there were higher proportions of homes with contaminated water than what was allowed by the government. What is the alternate hypothesis?
So for alternate hypothesis, you're always going to have H A. You pick your same letter and it's going to be greater than, less than, or not equal to the same number. Okay, so whatever letter, so since we had a P here, we're going to put a P here. So it's going to be.
H, oops, not not, H-A for alternate is the, I know that since I have 10% here, I'm going to put 10% here. I know that since I have a P here, I'm going to put a P here. And now I just have to figure out am I, what kind of values, they're saying that they think it's higher. So that means they think the proportion is higher. So I'm going to have that open mouth facing the P.
So it's greater than 10%. percent. Okay, if the Flint water study, the good people of Flint, in that sample, describe what would constitute strong evidence that Flint was not, that the government of Flint was not compliant with the federal safety guidelines, i.e. strong evidence against the established strong evidence against the established truth that the city was compliant. And also it has to be in support of a higher proportion. So let's see, problem number three.
So first of all, if we're trying to replace an old idea about proportion, portions, what kind of data is going to do that? It's going to be a p hat, right? So I'm going to say p hat and now just describe. Is PHAT going to be bigger than 10% or is it going to be, so it's going to be equal to the proportion of houses in the good people of Flint FSW sample.
that what's our focus that has have contaminated water okay um let's see if i'm answering this question strong evidence against so for it to be strong evidence this p-hat should be greater than 10%. That's what we're hoping for. I mean, we're not hoping for it. And how much greater to make it really surprising is the question. So was not actually compliant.
I think I answered that question. So this is what we're looking for. We're looking for a P, I mean we're not going to mess with the data, but we're going to get a proportion from a sample.
And here it is. This is what actually happened and this is all real. I'm not making this up.
The good people of Flint, the Flint Water Study people took a sample And it was out of 271 homes throughout the city. So they did a random sample. They did all the good techniques that we studied for the semester.
They found that 20% of their sample had, of the samples, 20% of the houses within their sample had contaminated water. So what was our P hat then? And this is kind of, this might actually confuse some people.
Our P hat was 20%. That's our key hat. They just gave it to us. We didn't have to do any calculations on that.
What was the sample proportion? That's what it was. Just to backtrack, where did they get that from?
Well, they looked, it was 20%. I just want to do the calculation so people know where it comes from. 20% of the total in the sample. So I'm going to convert that. 20% as a decimal is 0.2.
of this multiplication 271 is going to be and it kind of surprised me when I did this it's going to be 50.2 houses 50.2 houses had contaminated water um so really what they got was they had p hat equals 271 um and you can't have a 0.2 house. I mean, I guess you could have an apartment building, but I don't think that's what happened. So I think it was 54 houses and they did some rounding is my guess. But however it is to answer this question, what was the sample proportion?
There it is. The question now is, is that significantly higher than what the city claims? The DEQ. and the DEQ when I was first reading this I had trouble going back and forth with all these letters that's the city officials city officials claim that the city was compliant with federal guidelines and only 10 actually had contaminated water according to these old gray guys and women and our corresponding null hypothesis what was assumed what was the assumed population proportion of residents with contaminated water well it's the it's their claim um p the true proportion equals 10 so we've got the true proportion equals 12 this is the claim playing from the city officials.
and because they're the city officials, they get to say what the truth is, and the residents, this is the data, data from the residents. We fortunately had taken a stats class and knew how to get data, good data. Okay, how likely would it have been for the people of Flint to obtain a sample proportion that was 20% if the city is actually compliant. We can use statistics, we can use sampling distribution, sample proportions, and normal distribution to determine the probability.
So we Even just with one semester of statistics, we can really measure how unique this result right here. How unusual is this result? So since the normal distribution is continuous, it does not make sense to consider the probability of obtaining a sample exactly 20%. So I've talked about this in class where it's like how many what proportion of women are exactly five foot four even though it's the most common observation because there's millions and millions of women and all of us are a little bit taller a little bit shorter the probability of it being exactly women being exactly five foot four is zero the probability of the housing statistic being exactly 20 percent is zero so we're not going to look at that. we're going to instead look at the sample proportion was high, we're going to consider, we'll consider how likely it was to get that proportion or higher, because all of those observations will support the people of Flint's claim that the water is contaminated.
at a higher level than is acceptable. In other words, how likely are we to have a sample that falls, a sample proportion that falls in a high range of 20% or more if the true proportion is 10%? Well, they did something similar to our Dana Center Math Pathways tool and they drew a picture of reality. What they did was they said, oh, by the central limit theorem, we know that if we were going to do a p-hat distribution, n equals to 271. Oh, that looks bad. Looks like it's p-hat n.
I'm trying to see. So this is our sample proportion p-hat where n is 271. They drew a distribution. If the city is correct, then the true proportion is 10%.
That's the city's claim. And they went and they plotted their observation right here, p hat equal 2.2. And they calculated how likely it would be to see that observation or something more.
So they just used the tools we've been doing. And they got that this probability, that area, area, very small, meaning the chances of seeing that observation, given that picture of reality were really small. So that's what they did.
What can you conclude about the probability? What can you conclude about this probability? Would it have been very likely for the people of Flint to obtain that sample proportion or higher if the city were actually compliant?
That's the question. Well, I want to understand this decimal here. I know we love decimals because we don't have to do all the fraction math, but I think sometimes it's a really good idea to think about what that decimal actually represents.
Each zero represents dividing by 10. So divided One, two, three, four, five, six. So this really is two out of, and I'm going to give it six zeros. One, two, three, four, five, six. I'm going to pop the commas in so that I can understand this. What this is saying is we would, if, let me back up.
If. the city of Flint, city officials of Flint, are accurate that the true proportion of contaminated houses is around 10%, we'll give them a little bit of wiggle room, then, and I'll put this in red, then we would expect to see results as extreme. as p hat equal to 20 percent only two times in a million.
So could we get a result like that? Yeah, we could get a result like this. It could happen because with natural variation of data, we know there's some fluctuation in our p-hats, but we would expect to see this result only two times out of a million samples.
So I'm going to say, would this be likely? Would this have been likely for them to have this? No.
This result is very, very unlikely. And that's the punchline, is we plotted our observation inside the reality that the established city put forward as the truth, and it almost never would happen. What are the two possible explanations for a high proportion of residents with contaminated water?
So, two possible answers. One is going to be, oh, sample variability, sample variability, flash fluctuation, or natural fluctuation. in the data.
So only 10% off, that's probably just, you know, p hats fluctuate from the true proportion. But then the second possible explanation. is that the city is actually not compliant. Their claim is false. Those are the two possible realities.
and so which one do you think it is when you look at the picture? Considering the large proportion of residents with contaminated water found by the good people of Flint, what would you conclude? Would you conclude about the water in Flint based on the sample?
So would you keep, there's two explanations for why why we got the result we did. Are you going to conclude? So you've really got a choice for number seven.
You can keep H naught that the claim is true that there's only 10% or you can reject it. And so you've got a decision to make. So I'm going to, while you're thinking about that, I'm just going to write down what I know.
Looking, so I'm looking at the data, looking at the data, p hat equal to 20%, it does not fit inside the, and I'll put it in quotes here, reality. presented by who? The city officials.
So and I'll just I want to draw that picture again because we've been looking at these pictures we have here is the reality that we know that the that the sample proportions since you've got such a big sample size of n equal to 271. We know from the central limit theorem that it's going to have a shape that's normally distributed. And we know that the proportion, the true proportion is going to be in, is going to be at the center. So the city officials say the true proportion equals 10%. And that gives us a whole picture of reality. And actually, oops.
And I don't know if I can actually put our RP hat in this picture. I suspect we can't even. I bet it's more than three standard deviations away. So what are we going to decide?
Are we going to decide that the picture is valid? And all we can do for our decision. You can keep H naught or you can reject H naught in favor of H eight.
And those are our choices. And I think I'm going to take what would you I'm going to change this word to decide. Because when you're deciding, it's all about, do I keep H-naught or do I reject H-naught?
I'm looking at this picture and I'm going to reject. Reject H-naught. Just seems too surprising.
I'm going to conclude that the old claim is wrong and state. the established claim is wrong. It was later shown that they were actually lying and people went to jail for this because they were getting money.
So in the context of hypothesis testing, would you reject or fail to reject the null hypothesis? And so this is where you really only have two choices. Keep H-naught. or reject H naught. And what it said in fancy terms is, would you reject or fail to reject?
And in this case, I would reject big time, reject big time H naught, a bunch of exclamation points here. Those are the upset tied down exclamation points. I would reject H naught.
And I left a lot of space here. Why did I do that? What I want is I want you to say a conclusion.
It does say it doesn't actually say, but I'm going to just conclude in context. I want that from you. So just like, you know, describing the parameter, if you want to do it. conclusion in context, the, the decision is all about, do I keep, or do I reject H naught? The conclusion is all about the alternate hypothesis.
It's always going to be, was the evidence there to support HA or was there not enough evidence to support HA? So, and when you're in, in your conclusion, you, you want to really just like you want to describe what those ideas were. So does the data show? I'm just going to go ahead and write my conclusion. Conclusion.
Was there enough evidence? The data shows that there was enough. evidence to support, to reject H-naught and support the alternate claim.
And so what was that alternate claim? Just go ahead and describe it. That the true proportion of houses in Flint, this isn't the whole world, with contaminated water.
was what? Is greater than 10%. We're not going to say that it equals 20. Q equals 10%. We didn't prove that.
We just proved that the 20% shows that the claim of 10% is unusual, too unusual. So this is one conclusion in context. So you always want to make sure that you're in context.
So by in context, any good conclusion, you better describe your true proportion. So there's my description of true proportion. And then you're either going to say that the evidence does or does not support.
Just like, I mean, I don't know if you remember a long time ago when O.J. Simpson was on trial. uh to see if he was guilty or not guilty so the h not was the assumption was he's not guilty um and so we were trying so they were the prosecution was trying to prove that there was enough evidence to show he was guilty so we failed to reject the not guilty that doesn't mean that he was proven innocent So just like that, here, it's the opposite.
We're like, wow, the competing belief is absolutely seems true. So we're going to, but make sure to describe your parameter. Okay. And also your population of interest.
So that should be in there. And then whether or not the evidence was there or not there. And in this case, the evidence is certainly there. Okay.
So. For number nine, notice that in concluding um this hypothesis test we began by assuming the null hypothesis was true um that the the old city officials were not lying um then the good people of flint i know i'm a little biased here collected evidence to support the alternate hypothesis uh the possibilities are that uh they they either got enough evidence to reject H-naught or they did not get enough evidence to support, to reject H-naught. You never say that you proved H-naught. We did not prove that OJ Simpson was not guilty. We just proved that we couldn't, there wasn't enough to say he was guilty.
As we saw from the preview assignment, the only possible conclusions for H-naught are to reject the null hypothesis or to fail to reject. There's never a conclusion that we accept the null hypothesis. We never say, oh, we've proven this.
No. In your own words, describe the only possible conclusions for this hypothesis test. So we already have the conclusion, but I want you to go back in time and say, what were the two possible ones?
explain why it would not be valid to conclude that that Flint was in compliance. Well, it's a little confusing because the data is so overwhelming. But what are the two before we back up, before we look at the data, what were the two possible conclusions?
So one conclusion was if. The quote-unquote evidence, the data, is not strong enough to support HA. So we've got that scenario. And then we have the possibility if the evidence is strong, strongly in favor of HA.
So you've got two possible choices, not strong enough or... strongly in favor of HA. Well, if it's not strong enough, there is not enough evidence to support.
H-A. So we fail to reject H-naught and we keep H-naught. We keep the old ideas. unless there's overwhelming evidence. So, you know, the used to be the smoking, the claim was smoking doesn't cause cancer.
We, it took us so long to prove that not to be true because we needed overwhelming evidence. So the other possible, there is enough compelling. evidence to support HA.
So then what we do is we happily reject H naught and place it with HA. So what are the possibilities? We fail to reject H naught and we keep H naught or big or we reject H naught and replace it with HA. But you need overwhelming evidence usually to do that. It's hard work to change established quote.
beliefs, but it can be done. It can be done. And the people of Flint did it and people, and they got their water back and they got, they got a few government officials to actually be put in jail.
And they are currently serving jail sentences because they got some kind of financial kickbacks by switching the water source. So, um, based on the results of the good people in Michigan, as well as another study, the city switched back their water supplies to a previous source and the government declared a state of emergency for a few months where they got bottled water and stuff like that. So it wasn't a win for them because people actually suffered. Lead can be permanently damaging to people, especially children.
Summarize how the good people have slipped. provided evidence that contributed to these decisions. So I'm going to now summarize what they did.
And this summary is going to be what you should be doing for all of hypothesis testing. I'm going to do it in blue because no, I'll do it in green. Okay.
So step number one, what did they do? Well, clearly state two. competing ideas.
H naught P equals 10 percent, the proportion of contaminated houses. H A P is in this case greater than 10 percent. And in other studies it could be less than, or it could be not equal to.
Those are all possible. But what the, we're talking about what these guys did, this is what they did. And this idea right here is the established truth. And this idea right here is the competing new idea.
Then what did the people of Flint, Flint, Michigan do? They gathered data. And if you don't gather data the way we talked in class, people are going to pick it all apart.
So what they got was they got p hat is equal to 20%. And if they didn't know what they were doing, someone could say, well, that's just natural fluctuation of data. So in order to avoid that wrong claim, the third thing they did was they used probability and statistics.
to place, they placed their observation inside the picture of reality. of the establishment. In this case, they're city government officials who turned out to be corrupt.
And so if we look at that picture, so I'm going to keep it in gray because it's what the officials say. Oops, there it is. Central Limit Theorem.
11, 10% has to be here. We didn't really calculate this. You don't have to. They use software. And what they got is they got their observation as somewhere over here.
P hat equals to 20%. They only did one sample and they put it in that one picture. They then calculated how likely this observation, we would see this observation. If the establishment, if the established claim is actually, and what they got, and I'll mark it.
Oh, it's kind of hard to mark because I made it so tiny. I'm going to lift this tail up just a little bit and that's right here. That p hat is greater than 20 percent.
assuming that the establishment is true, was.0000, I think they were 5, 1, 2, 3, 4, I mean, who cares after a while,.00002 area is. So the total area is one, that's the area of what you get that result or more or bigger. And then what they did was they took it to a federal judge. And that federal judge rejected H-naught in favor of HA. So they showed the data, supported their claim.
And then the last thing is the conclusion in context. And you can look up above to see what that conclusion is. But basically, the data overwhelmingly shows they rejected this picture because you got your data and you have the claim of what's true.
And the claim of what's true is an idea. The data is real. You only get one p hat, but it so doesn't fit in this picture. So then what do you do? You know your data is real.
You reject the picture. But you can only reject the picture if the data is so overwhelmingly out of sync with the picture. And your data is good data.
Okay. So that was the conclusion. And I'm going to put a smiley face because usually researchers are happy if they can reject H-naught in favor of HA.
If you're a scientist, it means you get to publish and then you get tenure. If you are a pharmaceutical company, it means that you're replacing an old medicine with maybe new medicine that will certainly give you a lot more profit. And maybe it'll help people who knows.
And In this case, the people of Flint were able to challenge authority and show that they were lying. And it did show in court of law that they were lying. So going back here, low probability events, that was our p-hat, are considered evidence specifically against the null and the null. was our picture of reality, according to what the claims were.
Identifying context, whether sample statistics would serve as evidence. So if it fits in your, if it fits in your picture of claimed reality, then it doesn't help you. If it doesn't, if it's far away from the center, then it does. And then we've talked about constructing a null and alternate hypothesis.
And for your homework, null hypothesis is always H not. p or oh i put mu mu or p equals a number and h a is always that same p or mu is either greater than less than or just not equal to the number and uh that's what the alternate the null and the alternate looks like um so you'll need to know that for your homework and we we only it's the same number in each case this, the letters are the same. If you start with a P, you're going to have a P.
And the numbers are also the exact same number. So the only thing that's different is whether you are, you're disagreeing with the claim of equality. And in the context of the claim, you have to read it carefully to know which it is. Okay, good luck.
And don't do your practice. Oh, one more thing. I wanted to share a quote with you.
It's from Audre Lorde, who is a very famous, very well-respected, very respected. African-American female lesbian poet. And she is, or I don't know, I mean, more than poems, essays.
She is famous for saying the master's tools never dismantle the master's house. So she's, I think she's saying there, you got to break the system. And I'm not going to dispute that perspective, but I will say that if you learn statistics, Statistics are a tool that can dismantle old ideas, particularly the hypothesis testing.
So I hope I conveyed that in this video, and I hope you have a good time with the practice. All right, talk to you later.