Transcript for:
ANOVA Variance and F Test

Fisher was able to see this that the variation in multiple quantitative data sets is actually broken up into two types you have the variance between the groups and then you have the variance within the groups and it's really important if you're dealing with ANOVA that you get this idea you don't worry so much about calculating it nobody calculates this stuff by hand but you gotta kind of have the idea of variance between divided by variance within in fact the sentence for the F test statistic is the ratio of the variance between the groups to the variance within the group's ratio kin means dividing right like a fraction so what is this idea of variance between variance within well again the the kind of technical formula would be this we calculate the sum of squares between the groups divided by the degrees of freedom between the groups and the variance within is the sum of squares within the groups divided by the degrees of freedom within the groups but what does all that mean is a very difficult calculation by the way that's why you always let a computer calculate this but one thing you want to look at is sort of what's going on so the variance between the groups so now we're looking at comparing the sample mean of each data set each sample remember I'm thinking my separated quantitative samples if I if I separate those out and I look at the sample mean of each one how far is the sample mean from what we call the mean of means or the if I combine all the data together what would be the mean of all of that data it could combine so if you looked at all the data combined and how far is that mean from from each individual sample mean and this is a way of measuring how different the sample means are now the degrees of freedom because you have so many groups like if I have five groups or in this case the Minoan alternate paths I have for groups so the degrees of freedom would be 4 minus 1 or 3 K refers to the number of groups so K is the number of groups so in a sense the variance between is a measure of how different the sample means are between all of your data sets but we need something to compare it to so we compare it to the very variability within the data sets in other words the variance within the data sets so if you kind of look at the formula here X is referring to each number in the data set so you start you have to do this for every data set so you start with your first quantitative data set from your first group take every single number subtract the mean of that of that that data set and then square them and add up the squares all right and then divide by the degrees of freedom for that data set so if that data set had 30 than the degrees the degrees room for that column of data would be 29 but now I'm going to do it again for the second data set and the third data set in the fourth data set and so on I gotta basically keep adding up all of these so I would add up all the sum of squares from all the groups and then this would be called the sum of squares within and I would divide by the degrees of freedom within and that's sort of a formula for it but there is a beauty to the mathematics in this for people that love math this is actually a this is a one of the more beautiful formulas are a Fisher really thought about it in terms of if he there's a total amount of variance and and then there's a total sum of squares so the sum squares between and the sum squares within actually add up to the total sum of squares degrees of freedom between plus the degrees of freedom with an AK so adds up to the total degrees of freedom and in ANOVA printouts from computers you'll see actually these numbers are gonna they're going to these numbers now they're gonna calculate the F all the way for you but they're just giving you an idea of how the F works now the main thing is how do you read it right how does it work well like all test statistics look look and see if your test statistic falls in the tail it'll be a right-tail determined by the critical value right if it does then the sample data significantly disagrees with the null hypothesis and my few values probably going to be very small if the test statistic does not fall on the tail determined by the critical value then we will then we're gonna have a it's not going to be significantly different the sample date is not gonna significantly disagree with the null hypothesis we're probably gonna have a larger p-value and we're gonna fail to reject them all but what's the idea what's the idea why does that happen so think about it this way suppose my population means we're really different so I got a population if I look at my population means so if my sample means for my individual quantitative data that's all coming out different then the variance between the groups is going to be a lot higher than the variance within right it's almost like an exercise in fractions right if you guys remember fractions some of you're like oh no fractions but really the app test statistic is a fraction so think of it as a higher number divided by a lower number well when you divide a higher number by a lower number you're gonna get a very sometimes we call this an improper fraction right it could get really big so this gives you a pretty large that gives you a pretty large F test statistic okay so comes up pretty large so if the if the if the population means are really significantly different the sample means come out different then I'm gonna likely get an F that's very big and it's going to be in the right tail now what do I happen if my means were very close so if our population means we're close or the my sample means were very close well what would happen well now usually the variance between might be equal to the variance within or even less than the variance within right so you're kind of getting these aren't too much bigger you know or might be just slight advantage between might just be slightly bigger than the variance within not a lot bigger so usually if you think about that you know if we think about a lower variance between and a higher variance within what's gonna happen well think about fractions right this would be like a proper fraction you're gonna get a smaller number divided by a larger number so the overall fraction is going to be maybe remember pretty small right yeah pretty small so this is going to be pretty small we're gonna get a pretty small F if your F comes out to like one or less than one usually that's telling you that your sample beans are actually very close you're probably going to be failing to reject the null hypothesis over the population means might be equal but if the F comes out really really large then I know that variance between was a lot higher than the variance within and that gives me the idea that okay well that's gonna tell me that the PUF the population means really might be different I'm gonna be rejecting the null hypothesis that's the sort of the brilliance of the simplicity of this this doesn't look very simple this looks very complicated especially how to calculate it but the idea is actually very simple it's actually getting this idea about do you understand how a fraction works okay so that's kind of the idea of the F test statistic okay so next time we'll we'll look more into this test and show you some of the software and how to calculate the test statistics with software I definitely don't want to calculate this by hand all right well thanks for joining me and I will see you next time this is intro stats with Mattie show