Transcript for:
Chi-Square Goodness of Fit Overview

to come from group 2 all right well that means 50 times 0.5 gives us 25 and here I expect in the null hypothesis says that pop group 3 was supposed to get 40% or 0.4 so I expect 40% of my group to come from group 3 that kind of makes sense so I'm just multiplying I'm kind of using that old taking a percent of a total right or multiplying the proportion times the total something we kind of covered it the most the very beginning of our of our time together so a point 4 times 50 would give us 20 so my expected counts would be 15 25 and 20 notice again the expected counts did not come out equal because I know I because my proportions that I'm assuming in the null hypothesis are not equal so this is type 1 and this is type 2 and you can kind of see how we're getting different expected counts all right and all of these expected counts came out to 16.6 7 which was really just dividing 50 into 3 equal groups now I'm taking into account these specific percentages that are listed in the null hypothesis all right now how do you calculate the test statistic from that all right well here's the formula it says to take the sum of the observed minus the expected squared divided by the expected okay observed minus expected divided by the expected all right well again you're basically just kind of plugging this in so for each group you're gonna take its observed count minuses expected count square it and then divide by its expected count now what that's doing against you're sort of measuring the difference between the observed and the expected that's really the big key you're trying to figure out if the expected is significantly different than the observed here it is my sample data significantly disagreed with the null hypothesis right that's the point of a test statistic so we're really looking on that obviously we're going to get negative numbers sometimes so it's true statistics facts fashion we're doing a sort of almost like a sum of squares calculation we're squaring all the differences to get around get a get a get rid of the negatives and then we're adding them all up alright now this dividing by e kind of almost makes it like an average of squares so this is sort of a this is the sort of classic goodness of fit where test statistic so I can do this calculation for both of my examples but I'm gonna I'm going to do this one for for my type two again this one was tight this was my type one goodness of fit test and this was the type to goodness of fit test so I'm going to do type two so this is this is really coming from the type two so if I'm assuming the null hypothesis was 0.3 so I'm going to use these expected values so my observed count for Group one had 24 people had that characteristic if the null is true I expected 15 so 24 minus 15 squared divided by 15 gives you 5.4 that number 5.4 is called a contribution to chi-squared it's the numbers that are going to get added up to get the chi-square test statistic now what about group - well Group two we observed nine people that had that characteristic we expected 25 so quite a big difference there notice again 9 minus 25 squared divided by 25 gives us 10 point 2 4 again that's the contribution to chi-squared from group 2 and then the third group we expected 17 I'm sorry we observed there were 17 people in the sample data that had the characteristic we expected 20 if the null hypothesis was true so now we got 17 minus 20 squared divided by 20 and we get point four five okay that's that's its contribution I was still in class look at the contributions to chi-squared often times the computer will show you these numbers and you can also see which groups have the biggest discrepancy with the null hypothesis so you kind of look at these numbers you can see that group two had a much bigger discrepancy with the null hypothesis then for example group 3 group 3 they were pretty close group 2 group Group two though had a pretty significant difference there so that why I always look at these numbers and look at which one was the biggest now if we just add those numbers together that's what this sum in the formula means add them up and you get sixteen point zero nine chi-square does not act like a z-score or a t-score doc it does not like a member gets z-scores get significant around two usually chi-square can be very big if you guys remember in our study of critical values the chi-square critical value is non normal its squared numbers add up and you can get very big numbers sometimes with your chi-square so don't be surprised that this number gets really really big now the idea of it is the most important if the chi-squared is significantly big significantly large well then that's going to tell me my observed counts my sample data is significantly different than the expected counts from the null hypothesis in other words my sample data is significantly different than the null hypothesis right that's the whole how this works so the bigger just gets the more of a significant difference I have now I could do actually the same calculation with my type 1 here so in type 1 when we are assuming they were equal my expected counts for all sixteen point seven or sixteen point six seven I could do a very similar calculation for for this group I did it went ahead and did I just didn't read it all down for you but the chi-squared for this one was six point seven five nine so you can see this one was a lot smaller than the chi-square for the type two example in other words my sample data disagrees more with this null hypothesis than it does this null hypothesis ok so the main thing with this is just trying to get your I get the idea about chi-square how this chi-square work what does it tell me about the observed counts and the expected counts all right now let's look at what are the assumptions I kind of went out of order today when I did the assumptions and the main thing is that the assumptions we're going to look at the assumptions now all right I almost wrote number 10 there on Sutton alright so let's look assumptions we have a random sample they should have random samples hopefully or representative of the population very much like the two population me two population proportion test individuals with Danone between the samples should be independent now remember in two population proportion we wanted ten successes and ten failures but that was when you are using the z-score test statistic and when we looked at this at sampling distributions for proportions this is now focusing in on chi-square so we have a different distribution now so the to get to make sure that this is going to match up to make sure your data sets large enough we want all the expected counts to be at least five so all the expected counts or expected frequencies should be five or greater a lot of times in step programs you'll see expected value must be greater than or equal to five okay so when you get your expected values the computer will calculate all this it'll give you the contributions to kind of square it also give you expected counts because it knows you need to check them if any of these drop below five this test doesn't fit really well with what the way we're doing this if you're doing the traditional chi-squared calculation you really need the expected counts to be at least five all right so so random in the pendant expected counts at least five are our assumptions by the way if any of the expected counts drop below five a lot of times computer programs will give you an error message always always pay attention to those error messages and like your sample size is too small for you to handle this test that's usually what the computer will say okay so this is the so this is the goodness of fit the multiple proportion test all right the multiple proportion test all right well thanks for spending time with me and uh this is Matt to show and intro stats and I will see you next time