uh here we're finally going to use Microsoft Excel to solve this exact same problem that we have done and then later on we'll solve a few additional problems we have the age of different females getting married in these different states we have the data that we've collected in each one of these states and we're trying to do this a Anova test at a 0.1 level of significance so we're just trying to see if the population means for each of these states of the age of people that get married women that get married if they're the same or or not basically so what you want to go ahead and do first of all is you go to the data tab up here and the data tab on the right you notice I have a data analysis button now you might not have that initially and this is this is a Microsoft Excel 2013 I think it is but anyway this was not here whenever I started you have to install it so if you don't see it there just go to Google and type in how to install data analysis tool pack for Microsoft Excel basically you just go into one of these menus click a few buttons and then it'll start it'll pop up there it's not it doesn't cost anything so just go ahead and enable it and you'll see it there then you want to go and highlight your data these are the raw data that you're going to do and then you hit data analysis and what you want to select is Anova single Factor we're doing single Factor Anova because the only factor involved here is the location of the population basically we're just that's the only control that's or the only thing that's different there so you go and hit OK and a dialog box pops up and it's asking you for the input range of the data there now you need to click that button and highlight and select all of the data and if you have labels at the top go ahead and select that too click this to force it to accept and your data is grouped by columns which means the data for the populations goes down and you have labels in my case labels in the first row and it's asking for Alpha so you want 0.1 that's your level of significance for your problem it's going to dump the output in this case into a new worksheet but you have some different options there if you'd like and then you just literally hit OK and then it calculates the answer there you go let me go ahead and zoom a little bit we'll go to view zoom and I'll set the zoom at 145. so you can see it a little bit better all right so here you go a novel single single Factor this is what all of that work that we did pages and pages of calculations of all of that data has been reduced down to about five seconds where you can see uh See the fruits of it all basically what you have at the very top this is the the the the actual conclusion down here and we'll talk about that here in just a second but up at the top it gives you some additional data first of all it tells you that New York which was the data in the First Column had 10 samples when you add them all up it was 201 and when you divide that by 10 it's 220.1 so it's giving you the the sample mean this is the sample mean right here and then it's also calculating the variance of the data inside of that set of samples right which we were doing manually earlier and then it does the same thing for Texas how many samples and their average this is the sample mean this is the most important thing from there and Oregon and Oregon sample mean so basically the idea is Anova is trying to say hey for these different states these are the different sample means and obviously it looks like Texas is a little bit different but statistically is this enough to say that one of these states population means of of the age of marriage is enough is is actually strong enough is it strong enough to say that uh that that we accept or reject that null hypothesis so here's the difference here here's the answer and this is very similar to the Anova table that we created by hand so you have source of variation now the only real difference in Excel is that they have they use the word groups they don't use treatments the word treatments most textbooks use the word treatments but I also told you that you can be called they can be called groups so you have treatments and then in in Excel you call them groups so you have SS T this is SST sum of squares between groups sum of squares among treatments same thing this is the number that we calculated before and you can extend it out a little bit and see a few more decimals if you want this is the same number over here and then you have SSE within groups in fact I kind of like these terms a little bit better because they're a little more descriptive it's a source of variation it's telling you between the different populations how much variation is there and within each population how much error is there how much spread is there basically this guy is SST this guy is SSE when you add them together you get the total variation which is exactly what we had before then it just lists the degrees of freedom we wrote those down on the table before and that depends on the problem you have number of populations number of samples that you have and that's the total degrees of freedom and then this is MST MST or the mean Square between groups and this is the mean Square within groups so you got to think of it in terms of variation you're thinking about what's happening between the different groups how is the data spread between the different samples that we took the different populations and then how is the data spread inside of each sample set that we have and I've talked about that about 20 000 times up until now preparing you for this point this is MST this is MSE all right and then finally we use that information to calculate F Well I'll expand this out a little more so you can see a few more decimals here this is the test statistic 2.865079 is exactly what we calculated by hand before this is the test statistic all right so basically this F is the Ms T divided by the MSE basically that's what you divide these two numbers and you get F so that's why it's listed in the table like that all right so that's that's that now forget about p-value for now I'll come back to that in a second that's the table also tells you the critical value this is f sub Alpha I was calling it f Alpha a lot of most books call it f Alpha Excel calls it f crit so basically all you're doing when you get the answer here is you're comparing F to F crit and you know it's a right-tailed test so if this number is bigger than F sub Alpha F crit then you reject the null hypothesis and if it's not then you fail to reject a null hypothesis now I went ahead and did this live so that you could see it but I already calculated this before and I added a couple of things here that I want to show you so let me just go ahead and zoom in just a little bit 145 and I'll push some stuff down we'll come back to some of the stuff later I just wanted to show you an F distribution here this F critical value this is what Excel is providing right here f is larger than that that means you're clearly in the rejection region so literally all you do after an anova test in Excel is compare this number to this number and then you either reject it or you fail to reject it if this number were less than 2.51 over here we would be failing to reject uh then all hypothesis now I have a few notes down here I've probably said most of them but I want to make sure that I've said them uh I want to point out that all of these values match our manual calculations so that's good news SS between groups is the same as the sum of squares among treatments SS within groups is the error inside of the sample set that we have SSE Ms between group is the same as MST and same thing here I was just basically repeating myself F crit on this table is the same as F Alpha that we've talked about before defines your rejection region if your test statistic Falls bigger than F crit then you reject if not then you fail to reject and so on this is a this is a review null hypothesis is that all population means are equal alternate is at least one different from the others all right now the interesting thing here is that I've said over and over again this test only really tells you if if one or more of these guys are different um basically outliers it doesn't tell you which one now you can kind of look at this and see that because it's so obvious I mean these guys are equal and this guy was different you can kind of tell Texas is the outlier but if you have 15 populations and they're very very small variations between the two you're not going to be able to be able to eyeball it like that Anova doesn't tell you which one is different it just tells you that one of them at least one of them is different and possibly more than one is different well that's about all I have to say about doing Anova as far as how to do it in Excel basically it gives you the raw data the sample means it calculates along with their variance and then it just puts the table together in the same way that we wrote it down and all you have you can read everything off here SST SSE m-s-e-m-s-t uh the uh F value and then the critical value oh I know one more thing I wanted to tell you Excel gives you one more piece of data and that's the p-value and that might confuse some people at first basically you don't really need it when you're doing this because all you need to do is compare this number to this number you don't really need the p-value but if you remember back to hypothesis testing p-value was just another way to make a decision it's exactly equivalent to comparing F to F Alpha it's exactly the same thing basically if the p-value that you have for your data is less than uh Alpha the alpha for your problem then you reject the null hypothesis so Alpha for our problem was 0.1 that means the Shaded area to the right is 0.1 the p-value is less than that less than 0.1 so that means you reject the null hypothesis so it's giving you this because it's a computer and it can calculate the exact p-value because we do use tables and things like that it's hard by hand to actually calculate the p-value for these kinds of problems but basically it's giving you the information there so there's two ways to make a decision in Excel here all you have to do is number one compare this and see if it's bigger than F crit if it is you reject the null hypothesis the second way is is the p-value that you calculate for our data is it less than alpha or not in this one it is less than 0.1 therefore we reject the null hypothesis and you can go back to my other lessons a long time ago and what a p-value is but basically if the p-value is less than the Shaded value then it means it has to be that means the F value has to be over here somewhere because it has to be a smaller shaded area and that means it's on the right hand side of f-crypt anyway if that rings a bell great if not then just you know kind of uh just look at it and see if it's less than Alpha and if it is reject so I'm going to go and wrap this up we're going to do several more of these guys in Excel just to give you a little more practice and some more intuition with using analysis of variance learn anything at mathandscience.com