Transcript for:
4.4 One and Two Tailed Tests Explained

Okay, here it goes. Heads I win, tails I lose. Right. No, wait a second. This coin has tails on both sides. Now, this is obviously not the most helpful way to make a decision, but another kind of tale can really help you out in a pinch. Take Joan and Noah, two eager residents in the town of Nakamstiff. Bear with me here, but these two upstanding citizens are both really interested in a thrilling topic. Traffic. See, Joan and Noah both have similar questions about traffic, but they're going to find answers to their questions in different ways using tales. And I don't mean the kind that failed me when my producer and I flipped a coin over whether I have to wear this hat. Hi, I'm Sabrina Cruz and this is Study Hall, real world statistics. Hypothesis testing is how we take a hunch and throw some data at it to see what sticks in order to see if our perceptions match reality. In our episode on distributions, we lay out exactly how to set up a hypothesis test. But we're about to make them even more powerful by focusing on tails. In statistics, the tales of a distribution are the extreme ends. They tell us how quickly extremes approach zero. For instance, consider Joan, who's trying to choose an apartment and is torn between two very similar neighborhoods, Median Heights and Modal Manor. Ever conscious about the environment and her ability to get around easily, she wants to know whether one of them has more road traffic than the other. Her first consideration was whether there was a restaurant with the best tuna casserole. But since both neighborhoods have one, she's focusing on the traffic angle. Joan could flip a coin, but she's not playing around with the wrist like the one I just took. and all she really wants to know is if the neighborhoods are the same. So, she looks straight to hypothesis testing for her investigation. To start, she needs a definition of the level of traffic. She already knows that the average number of vehicles on the road on a given day, or average daily traffic, is how transportation planners get a handle on that sort of thing. In fact, the city's road monitoring network has sensors on the roads recording precisely that variable in the two neighborhoods over time, which she gets the data for using their traffic recordings. Using skew and curtosis, she quickly checks that the data do in fact look like normal distributions. No matter where you go, it's following you. This allows her to conclude that the sampling distribution is also normal. To check if the levels are the same, she needs to come up with a specific prediction that she can test. And when a test involves measuring whether a difference occurs along two tales of a distribution, we call it a two-tailed test. No people, not like the two-tailed coin ruining my life. The green curve represents the distribution of traffic measurements in modal manner and the purple curve represents the trends in median heights. Joan's null hypothesis is now that the average traffic levels are not different, meaning that the means of the two curves essentially overlap. Her alternative hypothesis, meanwhile, is that the mean of traffic levels in median heights is greater or less than the mean of the traffic levels in modal manner. It's not the minimum threshold that we're looking for in general. But here the test for this hypothesis now comes down to whether the mean for median heights falls in either the left or right tails of modal manner. But the two tails could be thought of as being for the purple or the green curve. Since whether the mean of one falls on the tails of the other ultimately comes down to the same thing as there being some kind of difference. And the stricter Joan makes her test by defining a smaller significance value. The less of that tail her critical region covers. That makes sense since we're essentially demanding that the mean of one distribution ends up further away from the mean of the other to detect a difference. Now Joan uses a significance threshold of 5% which is the typical standard. She finds her data are compatible with the null hypothesis meaning she does not reject it. She concludes that the data don't support the idea that one neighborhood has more traffic than the other. So Joan can happily make her apartment choice based on other factors like which apartment she vibes with most. We love that for you Joan. But taking a step back from Joan's victory lab, let's consider Noah, an urban planner working in the town of Nakamstiff. He's also interested in the traffic at Median Heights for very different reasons. He was involved in a new neighborhood park installation, and the people living in Median Heights, well, they have strong opinions. Some people said that the construction work blocking the road and the influx of cars would increase traffic. Others said creating a space that encouraged people to walk through the park rather than driving through the area would actually lower traffic. In the end, the project went ahead, but no one wants to know whether the park really did influence traffic levels in any way whatsoever. So, he knows for next time. We already know how we can use hypothesis testing to figure out if two things are significantly different from each other. But Noah's question actually has another component. He wants to know specifically if median Heights has more traffic now than it did before, which means his hypothesis test needs something else, a direction. Now, we don't mean which way the traffic was flowing, but rather the direction the data travel in our hypothesis relative to our null hypothesis. Stick with me here. Nomstive monitors traffic near all of its parks. And Noah can easily use that data to help him figure this out. The population he's now considering consists of the distribution of measurements of average daily traffic from the sensors 3 months before the park was built and 3 months after the park was built. That span covers all the little differences in weather and activity that might otherwise have thrown off the analysis. But if we think about what Noah's alternative and null hypothesis look like and how we might tell them apart, we see a different scenario than what we saw in Joan's case. His alternative hypothesis is that the amount of traffic did change more after the park was created. And his null hypothesis is that it hasn't changed at all or is less now. In this situation, the distribution of car congestion at median heights post park, the red curve, has greater traffic than before, the blue curve. So, we'd expect median heights curve post park to be shifted more to the right than before because the values of traffic measurements would be higher on average. Noah's null hypothesis is that traffic is either the same as before or that median heights has less traffic after the park was created. Basically, so long as the red curve winds up somewhere to the left of the blue curve, that would fit his null hypothesis that median heights does not have more traffic after the park was created. The goal of a statistical test is to figure out if the data support or don't support the alternative hypothesis. So, in Noah's case, we have to see if the data tell us that traffic post park in median heights is more than what it was before, which would support the alternative hypothesis and call a null hypothesis into question. But if the data tell us that traffic post park in Median Heights is consistent with what it was before, we wouldn't have enough evidence to conclude that there's more traffic now in Median Heights. Using our visual, what Noah is really testing is whether the test statistic falls into our critical region. And the stricter we make this test by demanding a smaller significance level, the smaller the critical region becomes. Noah runs his statistical test with a significance threshold of 5%. And discovers there's not enough evidence to say that median heights has more traffic than it did pre- park. He rejoices in the knowledge that all of these nimbies could be wrong. If we have a specific direction in our hypothesis, like median heights having more traffic post park, then the critical region of our test will only cover one tail, which indicates the change in the corresponding direction. Two-tailed tests are appropriate when our alternative hypothesis is only looking to find a difference of any kind between two populations. That means our critical regions for the test fall across two tales. For both kinds of tests, though, the significance threshold also determines the size of the critical region, which also has implications for how the tests compare. A one- tail test puts all of the critical region for a given significance level into a single tail. That means it's more sensitive to finding a conclusion for that given tail, like a blood hound on the scent, knowing what it's after. On the other hand, a two-tailed test with the same significance level for the same distribution, will have two smaller critical regions in each tail. Kind of like a guard ready for disturbances in any direction, but only able to give each direction some of their attention. Ultimately, this means a one-tailed test is more sensitive than a two-tailed test if there's an effect in the hypothesized direction. For instance, let's say Joan's two-tailed test had a p value of 0.06. That's why it turned out statistically insignificant, even if it was just outside the threshold of significance, which was 0.05. Looking at that number, she realizes that a one-tailed test for lower traffic in modal manner would have come out significant since the critical region for the tail would be consolidated into one tail and so the p value would be smaller at 0.03. That's because she would only need to account for one tail this time, not both. which means she could conduct a one-tailed test and confirm that hypothesis instead. Right? Wrong. Think about flipping a coin. Hypothetically, if you say in advance that heads wins and then you immediately get tails, you can't then decide that actually you wanted tails to win, even if you desperately wish that was the case. Once you've decided on a two-tailed test, you can't switch to a one- tailed test after the fact. The point of your test is to give you an unbiased judgment of your hypothesis by deciding what your critical region is going to be and then using data to work out whether the test statistic falls into that region or not. By making a choice after that first test, the second test is biased because of what you learned. And you're essentially adding some more area onto the critical region. That's especially important if your results are on the edge of significance. Making the switch from a two to a one- tail test in that case will give you a significant result that isn't really reliable. Unfortunately, there are some researchers who pull this kind of thing to get significant results and it's worth keeping an eye out for. This sort of thing can cause major problems in many situations like evaluating whether clinical treatments are effective. Still, when used ethically, making out these differences is one of the most important tasks in all of statistics. Determining differences applies to all kinds of important areas of life, like deciding whether a drug improves the odds of survival in patients or whether someone's gender or race has an influence on their salary. And really, whether it's comparing neighborhood traffic or getting out of wearing a weird hat, statistics is a helpful road map, one that arms you with better ways to make decisions than flipping a coin. Still tails. If you're enjoying this series and are interested in taking the full study hall real world statistics course and earning college credit from ASU, check out gostudyhall.com or click on the button to learn more. And if you want to help us out, give this video a like, smash that subscribe button, and comment what you think of this hat. Thanks for watching. See you next time.