Transcript for:
Expected Counts in Contingency Tables

take a look at this a little bit here's my observed counts and again the students were then after they tried to memorize information they did control for confounding variables like the volume of the music had to be the same you know all those kinds of confounding variables they used random assignment so so again the controlling confounding variables just like we talked about experimental design a little bit and then they classified the student has either high retention they were able to memorize quite a bit of information or low retention they weren't able to memorize very much information so they basically classify the students as one of those two okay so how does this work all right well let's let's think about this a second what we want to do is we these are the observed counts 10 14 11 15 18 and 7 those are the observed counts but what we want to calculated the expected counts what we expect to happen if the null hypothesis was true so if the variables are now if we being high retention or low retention is not related to music well if that's the case then music doesn't matter right if music doesn't matter everybody should have kind of the same percentage of high retention and if music doesn't matter we should have the same percentages for low retention that's what we mean by conditional percentages are equal or the same notice I'm not saying that the high retention percentage would ever be the same as the low retention percentage what I'm saying is the high retention percentage that variable should be the same in all my groups so for my three music groups all the high retention should have the same percentage and then same thing here the low retention should be the same in all three groups so low retention should be the same in all three groups so if music doesn't matter then I should just go to the totals right so high retention would be 39 divided by 75 by the way this is called the grand total this is that it's their seventy five total total 39 of them were able to get the high retention mark so 39 divided by 75 would be 0.5 - so about 52 percent of all the people in all the groups basically got high retention they were able to memorize that a significant amount of information and then we have low retention people that were not able to memorize very much information well again if music doesn't matter then it should just be 36 out of 75 so that would be point four eight or forty eight percent so forty eight percent of all the people in all the groups basically classified has low retention so think about it this way if the categorical variables are not related right if music doesn't matter then all of my group should have about 52 percent higher retention all right that's the big key all right that's what we mean by distributing the distribution of conditional percentages are equal or the same so if I look at that okay well that means that all of these should be 52 percent so my you have to actually calculate an expected count for every one of these cells so again it's very important if you're actually doing this by hand which I was doing by hand you can actually want to label your expected counts to know which expected counts goes with which observed count because that's going to be very important later so here's the high retention light music so this is the people that got to listen to their favorite music and try to memorize information again if then if music doesn't matter then they should have had 52% but there was 24 people total in that group right so 52% times 24 was going to give you how many people in that group I expected to get high retention so 0.5 2 times 24 gives me 12 point 4 8 okay so what I'm doing is I'm basically multiplying the percentage that suppose be the same remember the null hypothesis that the percentage would have to be the same times the number of people in that group obviously these groups have different amount of people so my expected counts would not be the same so now if I high and disliked group so these are the people that have to listen to music they absolutely hated and try to memorize information and again if music doesn't matter then they should have had 52% high retention so 52% but this time they had 26 people in their day in their data so 52% times 26 would be 13.5 - so I expect 13.5 - if the null hypothesis was true same thing here again if if if this is music doesn't matter then the no music group should have had 52 percent high retention so again I would want to do 52 percent times the number of people so it's important to realize these numbers here are called the observed count sees or not expected counts don't use these numbers but the number of P total number of people times that 52 percent give me 13 so that would be the expected count for the high retention no music group so you can kind of see I'm kind of labeling them here now what about low retention again same thing remember I don't think high retention is equal to low retention what I think is if the nullus was true then the low retention percentage should be the same in all the music groups that's the idea right is the percentage the same in all the groups so 0.48 was the overall low retention so if again if the null hypothesis is true and music doesn't matter every group should have 48 percent low retention that's how we get to the expected count so again Group one was 24 people times point four eight 48 percent so the expected count for the lower attention liked music was 11.5 - same thing here I'm multiplying the the total number of people that group times 0.48 that's the percentage that's supposed to be the same right twelve point four eight for that expected count again the the no music group had 25 people 25 times point four eight would be twelve so these are the expected counts that we would expect to happen if the null is true remember these are connected to the assuming that the null hypothesis was true now computers really don't calculate this way but this is the this is the way you want to think about it then okay well the the percentages should be the same if the null hypothesis was true in that and the variables are not related but what the computer does this is what the formula you'll see in stat books and in computer programs they'll take the row total times the column total divided by the grand total now whenever I always saw that that formula I was always like what does what does that have to do with the null hypothesis right they always said you know just calculated that's the calculation but I can't I am on a very a white kind of guy I need I need to know why that's the null hypothesis right so and so think about it this way right if the null hypothesis was true right then the percentage of high retention should be the same for all three groups right well how did we get high retention well that would be 39 right 39 was the row total divided by 75 well that was the grand total so this row total divided by the grand total is actually the percentage is supposed to be equal in all my groups but each group has a different amount of people in it so I have to take that into account when I calculate my expected count so now I'm going to multiply that equal percentage or proportion times the column total will the column total is just the number of people in that group so again I'm just kind of using that old formula proportion times total in a lot of ways that's what this is doing so a lot of people you know have a hard time with that formula but what it's really doing is it's it's seen that the percentage would be the same for all the groups and then you're multiplying by how many people are in that group and this sort of for example if we look at the high dislikes group here's right here high disliked so high retention disliked right that group so again we would get the road total was 39 times the column total was 26 divided by 75 the grand total and that actually gives us the same answers if we did this by percentage or proportion so that gives us the same thirteen point five two so this is a little bit more efficient formula for computer programs and things like that so this is what formula is programmed into most computer programs so alright so we got the we got the idea of how they calculate the expected counts now we're going to go to the test statistic the chi-square test statistic we talked about this in the during the goodness of fit test right we were looking at we take each for each each observed and expected count we're going to take the observed minus the expected squared this is where it gets the same chi-square and divided by the expected and then we're going to add them all up so the one thing is you do have to sort of make sure the right expected count goes with the right observed count so the observed counts are the ones in the table so if we look at liked high retention that's liked music high retention they memorize a lot of information there was 10 that's the observed 10 10 people were able to to score high retention in the liked music group and so I would do ten minus twelve point four eight twelve point four eight was my expected count for hi-lites group square it and then divide by twelve point four eight and you're basically gonna do this for every single observed counts you imagine this is a pretty tiny you can imagine if you had a you know a 5x7 right yeah five rows and seven columns right you'd have thirty five of these so this by the way this this is a two by three table so 2 by 3 it has two rows and three columns so not don't count the totals just the just the ones that are not totals so two rows and three columns not counting the totals by the way the degrees of freedom we said was rows minus one times columns minus 1 so 2 minus 1 times 3 minus 1 would give us some degrees of freedom to for this table alright let's go back to our expected counts here so for a disliked music high retention the observed count was 11 the expected count was 13.5 2 so I'm going to do 11 minus 13 point 5 to square it and then divide by 13 point 5 2 and so on right we're gonna take each observed cap for no music high retention we have 18 notice that by the way that the in the no music group actually did it quite a bit better than either of the music the liked or disliked music look it's kind of interesting the liked and disliked music groups did about the same and neither did as well as the no music group so it seemed like the silent room people were able to do a lot better and memorize information a lot better than people that were listening to some kind of music ok so let's take a look now 18 was the observed count minus 13 was my expected count squared divided by 13 and I'm going to do the same thing for the low retention numbers so