Understanding Hypothesis Testing in Statistics

The last few videos have been focused around the descriptive statistics, so that branch of statistics, looking at data, trying to describe it, an introduction of a bunch of concepts like probability distribution. Now we're moving on to the other branch of statistics where we're looking at inferential statistics. In a lot of analyses it's sort of the next step so we do a lot of the descriptive statistics and then we move on to inferential statistics and this is where we're drawing conclusions and trying to make inferences going from our sample back to our population. So we're going to introduce the concept of hypothesis testing now and a lot of definitions a lot of terminology.

Before I get into the nitty-gritty of this section of this module. I do sort of want to do a little bit of a disclaimer or maybe it's just more of a warning that this because there's so many concepts we need to introduce at this point okay. You might at some points of of the module go I just don't understand what this is and we won't necessarily be calculating anything. But what I want to encourage you to do is work your way through this module where possible.

You know, do your best to understand the concept as it's introduced. Do some research, you know, look at some of the other learning resources that we have. But also, if you get to the end of this, the module and go, I still don't understand that.

That's okay at this stage. Because what's going to happen is in future modules we're going to come back to a lot of these concepts, link back to the theory that's introduced here and hopefully you know then we're going to be doing the calculations, practicing it, doing it in application and it will make a lot more sense. All right so just for now try your best to understand what you can where you can but again as we link into other other concepts and calculations in the later modules this stuff is going to become really important so consider this your foundation okay and we'll get into it a bit more in the future all right so this is introduction to hypothesis testing so i guess first question is what is a hypothesis so it's a process that we use so Generally speaking, when people say they're doing statistics, a lot of the time it involves hypothesis testing. So what we're doing is we're coming up with a hypothesis, something that we want to test about our data or about a phenomena that we're trying to explain or make an inference about.

And we use a hypothesis testing framework where... We gather the data and we go through the analysis and we try and use that data, that particularly sample data, to either support our hypothesis or refute it. Okay, and previously we introduced this concept of sampling distribution.

So these sampling distributions are going to form the basis of how we're making these inferences, how we're making these decisions in hypothesis tests. Okay and what we're trying to establish is whether observations that we're making based on our sample data, if they're likely to have occurred by chance or if they're due to real effects, associations, differences between groups. Depending on what our hypothesis is we're hoping to use probability and sampling distributions to make a decision.

Okay, and what we say is that when we see these effects that are not likely to have occurred by chance, we say that that's a statistically significant effect. So what we do is we make a statistical hypothesis. It's usually a statement or a claim about...

a population parameter so something that's happening in the population and what we need to do is we need a pair of hypotheses so we have one that's called that that represents our research question or the claim that we're making and then the other one needs to be its complement okay the opposite Opposite of what we are claiming. All right, and then we what we do is if the one is shown to be true or at least Probability wise it looks likely to be true. Then the other must be false Okay, so that's why they need to be complements of each other.

They need to be opposing claims So we start off with the null hypothesis. So in hypothesis testing, the type of hypothesis testing, the full name is actually null hypothesis testing. So that's what we're going to be doing.

We start with a claim that says that there is no effect. We call this a statement of equality. And when we say equality, we don't mean that things are exactly equal. But by saying things are equal, that we're saying there's no effect. All right.

And so it depends on what your claim is. We might use terms like there is no difference between the groups. There is no association. There is no effect.

So by saying there is no effect, we're saying that, you know, there's that equality between the things we're comparing. Okay And then what we do is we have that Complementary statement the opposite and that's our alternate hypothesis So the null hypothesis we we write H with a zero subscript the alternate we go H with a With an A. Okay, and that's our statement of inequality And so that will be the opposite statement to whatever our null is now It's almost counterintuitive because this is usually what we're trying to prove or what we believe is true.

This is usually our claim or our research question. So usually when we are trying to do a study, we want to show that there's an effect. So this is usually the one that we actually believe is happening in the population.

But what we do is we start. with assuming that there is no effect and we gather data to either reject this and accept this statement or we can't get any support for this and so it's like the the idea that there is no effect is the is the right one okay but essentially whatever the null hypothesis is which will always always be the statement of no effect the alternate will be a complementary opposite opposing statement okay now sometimes we well almost all the time we do this in words. In statistics we're also going to use mathematical statements to show to show this and the different hypothesis tests we're going to do there's going to be different mathematical notation okay and what we do is it helps us determine you know what it is we're looking for.

Where we're looking for these effects, whether we're looking for them in a positive direction, negative direction, all of that. Now essentially based on where we're looking for an effect, if it's positive, negative or it doesn't matter whether the direction, we call those the nature or the tailedness of the test. So you can have a left tail test, a right tail test or a two tail test.

So a left tail test is usually looking for decreases between two things. So you can see here I've got my null and my alternate in all three cases. But, you know, we are comparing a population parameter to something, a claim that we're making. Okay. and remembering that the alternate is usually what we actually think is is happening in the population but we start off with this idea that there's equality or no effect so you can see that this one is a greater than or equal to sign less than or equal to sign and then this one's just an equal sign so that is mathematically it's trying to represent the fact that there is no effect and then the alternate is going to be the opposite statement.

Now left-tailed and right-tailed tests will always make a statement that's directional. You can see here, so we're only looking for a fair in one direction. Something like that might be that we're expecting a significant increase in something.

Maybe after taking a medication we would expect it to significantly increase. What's something that we want to increase? Of course while I'm recording this I can't think of one. or we might want to significantly decrease and that one I can think of so often we want to decrease blood pressure okay so if we're developing a medication that's supposed to decrease blood pressure that will be the thing that we're looking to observe and we're only in a one-tail test we're only interested in if it decreases okay so even if it increased we wouldn't be looking for that and we would only consider a significant decrease.

So that's these these one tail tests when you specify a direction and even if it goes in the opposite direction we we disregarded and go no that's not in the direction that we're wanting to do. A two-tailed test is you can see here we say it's non-directional we're not specifying that we're only looking for an increase or a decrease the the effect could go in either direction and we'd be interested in it going in either direction. Now in this course we're only going to focus on two-tailed tests.

And there's two reasons behind that. So I've mentioned the one-tailed test, even though we're not going to use it, because you'll see it in textbooks, you'll see it mentioned on websites and things. It is a concept, but in reality, in practicality, say, for example, we're using the blood pressure medication, and we only really want this medication to decrease.

blood pressure. As a researcher if you found that that ended up significantly increasing blood pressure you would be very interested in that and so really the idea of only considering an effect in one direction is not it's not practical. okay we'd be interested in big enough effects in either direction and so really two-tailed tests are what we want to do we also know that it's easier to get a significant result so when i say the word significant we're going to go into this a bit more but i'm just going to introduce the concept when i say there's an a significant effect it means that you know that that effect wouldn't have occurred just by chance And it's actually easier for us to conclude that in these one-tailed tests.

And so it's almost cheating to do a one-tailed test instead of doing a two-tailed test. And so we've actually, we steer away from doing these at all. They exist, but it's better statistical practice to always do a two-tailed test. Okay, so two-tailed test is non-directional.

We can be looking for increases or decreases. And so the null hypothesis says that there's no difference. So if we could say comparing two groups, there's no difference between the two groups. If we're looking at an association between two variables, we say there's no association.

Our alternate will say that there is. There is a difference. There is an association.

You can see this note here we say that notice that these hypothesis statements we're using population notation. So this is actually what we believe is happening or what we're going to make an inference about in the population. When we do our calculations in our hypothesis test everything else is going to use sample notation okay because we use the sample data and sample statistics we we use that to then go back to these hypothesis statements about what's happening in the population okay um so we're going to this is going to change this statement just by itself um is is going to change based on which hypothesis tests we're doing but essentially how this reads is the null hypothesis says that in the population The average is not significantly different from and K here is just any any value okay so we're saying this it's not different from a value that we put there the alternate says that this not equals to is actually saying it is significantly different from a value all right so we'll see this this change but at this stage We're going to focus on two-tailed tests. We use population notation.

And we're always going to have these complementary statements. Okay. So these hypothesis tests use these sampling distributions.

And the sampling distribution is always going to start with the assumption that the null hypothesis is true, that there is no effect. And then what we do is we... gather data and it's either going to get us to make one of these two decisions, it's either going to get us to reject the null, so the null said there's no effect, so we're either going to reject that and say there is an effect and we call this statistically significant, or there's not going to be enough evidence based on our sample data. To make this decision to reject the null and so we're going to accept the null so we're going to accept that The there is no effect and we say this is not statistically significant Okay, also just we're going to from now on I guess we're going to keep this word significant To only refer to when we're talking about statistical significance.

So very often we'll use the term significant in English to mean substantial. So we'll say, oh he significantly increased his wealth. Okay, meaning he increased it by a lot. We're going to reserve this word significant. to only ever mean when we've established it using a hypothesis test to either be statistically significant or not statistically significant okay and not to mean substantially or greatly or notably okay use those words instead when you're talking about the magnitude of something

Transcript for:Understanding Hypothesis Testing in Statistics

Transcript for:
Understanding Hypothesis Testing in Statistics