type 1 and type 2 errors are critically important to the study of hypothesis testing but they're often quite misunderstood as is the concept of statistical power so this particular video is dealing with all three of those in a really neat way and hopefully by the end you're going to be understanding exactly what this graph is all about this one over here if it doesn't make any sense to you just yet don't worry I'm gonna lead you through it and in fact it's only when I myself started teaching this exact topic that I understood some of these concepts really for the first time so hopefully I'll be able to open the window a little bit for you so that you can grasp things fully my name's Justin zeltser it's said statistics calm if you like this stuff that I do feel free to tell a friend so the audience can grow and I can keep making videos like this or you can just like and subscribe do all those kind of things but otherwise come with me and let's explore type 1 and type 2 errors okay so type 1 and type 2 errors are firmly entrenched within the concept of hypothesis testing so we need to do a bit of a recap on hypothesis tests and I think the best way of doing that is using the classic court case example where there's a defendant on trial and we're going to give them the benefit of the doubt so they're going to be innocent until proven guilty so what we do is we in our null hypothesis this top line is called a null hypothesis we say that the defendant is innocent and we want to see if there's enough evidence to move from the null hypothesis to our alternate hypothesis state and realistically it is this alternate hypothesis that we are seeking evidence for so it's important to note that the default setting or null hypothesis is the reverse of what we're seeking evidence for we always are conservative we start with the assumption that the person is innocent in this case and we're going to see if there's enough information to show that they're guilty and this process of being conservative is common to both the courtroom and also to doing statistical hypothesis tests so these four boxes represent the possible outcomes from a criminal trial let's consider the case where the person is actually not guilty now we can never know whether the person is not guilty or not but let's just say that this person happens to be not guilty if the jury pronounces them not guilty then we've made what's called a true negative we've correctly identified a lack of guilt or a negative right so we haven't been able to budge from this null hypothesis and it was correct for us not to budge from that null hypothesis now a false positive means that we've actually falsely decided that this person is guilty when in fact they happen to be not guilty that's called a false positive and for the same reasons down here when the person happens to be guilty and we've called them not guilty that's a false negative and then of course we have a true positive down in this bottom right corner where we've correctly rejected that null hypothesis to say that yes this defendant is guilty now it's important to realize that in these two cells the ones on this off-diagonal were actually committing an error in the top right we can call this a type 1 error and that means we've incorrectly rejected this null hypothesis and that's given a Greek letter alpha it's also actually called the level of significance and as we'll see in a test for significance we can actually set that to be whatever we like it just depends on how strict we're being with the evidence we require to reject that null hypothesis and pronounce this person guilty now a type 2 error is also given the letter beta and that's when the person was guilty but we just didn't have enough evidence to find them guilty so we've incorrectly failed to reject the null hypothesis now we also get this thing called power which is the ability of the test to correctly reject a false null hypothesis if the person was guilty the power tells us how likely we are to correctly convict them now be aware that beta plus 1 plus beta is equal to 1 similarly 1 minus alpha plus alpha is equal to 1 so these are the probabilities looking across these rows but let's put this into a more general hypothesis test that is used across the statistical spectrum and I've generalized this hypothesis test such that the null hypothesis is simply that whatever we're testing has no effect and the alternate hypothesis is simply that there is an effect now there are numerous hypothesis tests that fit this description for example in the medical world you might be testing for an intervention and the null hypothesis might be that the intervention is ineffective or in other words has no effect and the alternate is that the intervention is effective other examples are our diagnostic test so say we're looking at testing for covert right the person either has covert or they don't and the diagnostic test could either come back positive or negative other examples are in regression if we're looking at particular variables we might in the null hypothesis consider them insignificant and we're going to see if there's enough evidence to suggest that they are significant and a final example and when we are comparing group means so two different groups or two different treatments are being compared and we want to see whether their means are equal and the alternate hypothesis will be that the means are different and you can see that in each of these cases the null hypothesis is almost like our conservative starting point we really want to show that the alternate hypothesis is true but because we're statisticians we're always conservative and we make our default hypothesis or our null hypothesis to be that there is no effect so pretty much every hypothesis test is going to fit this description and what we'll have is a again a true situation in which we can never know and then we have our sample results so we usually take a sample of something and try to assess using that sample whether there is an effect or not or we can say that the effect is significant or not significant so I'm hoping you're seeing the similarities between this and the courtroom example but you can see we've got the same things we've got our false-positives and false-negatives called type 1 and type 2 errors and again we have this power down here as well which again is the ability of the test to find an effect if indeed there is one okay so let's see how this actually pans out with an example now before we get to the practical example I thought I'd do for the first time a bit of self-promotion I've got a website called Zed statistics com which has a whole heap of categorized statistical videos you can see them here just thought I'd let you know in case people are like what is this guy what's he doing just putting up videos all the time I'm in the process of putting together what I hope to be a nice complete set of sort of beginners and intermediate stats for people to you to use so feel free to tell your friends about it those who are studying stats or those who are stats curious let them know anyway that's enough for me let's get back to the video and take a look at the example that we're going to be using which is all about smoking cessation and whether that improves lung function let's check it out in assessing whether smoking cessation or in other words whether stopping smoking improves your lung function we need to set up our null and alternate hypotheses and because we're conservative we're going to go with the null hypothesis being that there is no effect of smoking cessation on lung function and here I just said that the difference is zero that's what that curly Greek Delta represents the difference in lung function between those that stop smoking and those that continue smoking and our alternate hypothesis which is the thing we are seeking evidence for is that it improves lung function so again it's that common conservativism where we put the no effect in the null hypothesis to see whether we have enough evidence to reject it and then we can set up our 2x2 table again if we like with the true situation over here which is the reality of the effect of smoking cessation on a lung function and then our test result but we actually find from this test we're about to do now that's all well and good but let's go a little bit deeper and see how we can derive how big some of these probabilities are such as the probability of a type 1 and type 2 error and also have a look at the power so this is where it's gonna get good people okay so this is my favorite bit we've got our null hypothesis that there's no effective smoking cessation on lung function in other words the difference this is Delta the difference is actually equal to zero now let's think about this for a minute if the true difference was exactly zero meaning that whether you smoke or you stop smoking it doesn't affect your lung function imagine you took a sample would you expect the sample difference to be zero you probably expect some kind of random variation in your sample right it wouldn't be exactly zero potentially those that stopped smoking in your sample might have a slightly higher lung function or maybe they might even have a slightly lower lung function than those that kept smoking because when we take a sample we are susceptible to all the random variation that sampling entails they very rarely match up perfectly with the mean of the population so what a hypothesis test is going to do is set up this region up here which usually is given a probability of 5% so that would be 5% of the total area of this curve but we're going to set for ourselves the probability of incurring a type 1 error and that basically means that if our sample mean is too extreme say it's up in this black shaded region we're going to reject the null hypothesis but don't forget it's entirely possible that we're rejecting a true null hypothesis and this black curve is indeed that situation where the null hypothesis is true so it's almost like we allow ourselves a 5% chance of getting at but that's something we just have to live with right now here's the best bit let's now presume that in fact the alternate hypothesis is true so if it was true then it's going to be some difference in the lung capacity or lung function of those people that stopped smoking versus those people that keep on smoking so if indeed there is an increase in lung function this yellow curve must exist somewhere right this must be the true representation of the possible outcomes from our sample our sample is not centered at zero our sample in reality is centered at Delta so what happens then is that if you look to the left of this black line don't forget this is where we had our type 1 error the black bit here was our type 1 error that's where we would reject the null hypothesis to the left of that point we are not rejecting the null hypothesis right so if we do not reject the null hypothesis for the purpose of this yellow curve that's us committing an error because we now know given this yellow curve that there was a difference so if we've failed to reject it this is the area which is going to result in us committing what's called a type 2 error so if we put the type 1 error label back you can see it's quite neat you've got this type 1 error which we've defined from the level of significance of our hypothesis test and wherever that line goes that starts this rejection region to the left of that will be where we have a type 2 error in terms of this yellow curve so where's the power in all of this well the power is everything else in this yellow curve so it's this whole region here here's our power essentially 1 minus the area in the type 2 area now I actually found this really great as an explanation for how all these things interrelate and so it's probably worthwhile you thinking about this now I've got three questions for you and hopefully you can pause the video when I ask them and have a real think about them before I reveal the answer but I want you to think about how the power which is in fact this region here how is the power affected when s increases s is the underlying standard deviation of our sample now with more variation and uncertainty in our sample these curves are going to become fatter and if we hold everything else the same you can see that the area of overlap is going to increase therefore the power will have to decrease as you can see that the power now takes up a smaller proportion of that yellow curve now what happens when n increases which is your number of observations in the sample again pause the video and have a think about how that might play out but again what happens is that n influences how skinny or fat these curves are going to be and the higher the value of n the more observations we have their more confident we are in our sample means so that the skinnier these curves are going to be so if n increases we're going to get skinnier curves so if there's less overlap the power is going to be much larger and finally what happens if the difference increases so if these two curves get pushed further apart again the overlap is going to decrease so our power is going to increase which I guess on the surface of it might not make so much sense but think of it this way if the true difference in your lung function between those that stopped smoking and those that continued smoking if that true difference was really large we can probably be quite sure that our test will reject the null hypothesis our test will find that difference if indeed the difference is very large if the difference isn't much smaller it's gonna be a lot more difficult for our tests to reject the null hypothesis so that is it that's type 1 type 2 errors and also a look at power as well I've got a whole bunch of other videos on hypothesis testing which you can check out and I'll put a link in the description for the playlist for the hypothesis testing videos but thanks for coming along for the ride and I will catch you next time