so this video is about using two-way tables now you would have seen two-way tables before but this time i'm framing a little bit differently we're talking about investigating the associations between two categorical variables so let's just take a look so i'm going to survey males and females about their cheese preference some people like hard cheese some people like soft cheese so i go to the first person and i say are you a male or female they say i'm male and i like hard cheese and i'm next to the next person and they say i'm female and i like soft cheese and i'm male and i like hard cheese and i'm female i like hard cheese and i'm male and i like soft cheese and i get a big long list of data points okay now if i get a big long list of say 100 data points or 50 data points a big long list doesn't really interpret the data as well as a two-way table would be so let's instead put this into a two-way table putting it in a table like this is going to make our life much much easier and faster now something i need to mention about two-way tables is that you need to put your explanatory and response variables in the right place so by convention we always put our explanatory variable here and we always put our response variable here now the hypothesis here is that your sex uh determines what kind of cheese you are more likely to prefer okay so sex explains cheese preference okay so you walk out onto the street you start surveying people again and you go right you're male and you like hard cheese you're female and you like hard cheese as well and you're in this category in this category in this category and oops five and that category and this one and this one and this one this one this one this one and this one and we start tallying things up side note this way of tallying is really stupid you should stop tallying things this way and tally in them like this instead so one person likes this two people like this three people like this four people like this and five people like this and then you draw another box in another box in another box i like this way of telling it gets used a lot in southeast asia i think some places like france and brazil use this as well i think it's way better than those tally marks takes a little bit getting used to but i love it all right so we can see we have eight males that like hard cheese we have five people that uh five females are like hard cheese we have four males like soft cheese and we have seven females that like soft cheese let's replace those tally marks now with some numbers all right much better now we can come up with our totals all here right so it looks like 13 people in total like hard cheese it looks like four plus seven is eleven like soft cheese it looks like um there were seven plus five twelve total females surveyed and it looks like there were twelve total males surveyed and this and this should both add up to the same number 24. all right so this is an interesting sort of question here because we surveyed the same number of males as the same number of females but that might not be the case maybe we surveyed way more females than males or vice versa just make a small change here 14 instead of 4 and that brings our total here to 22 and it brings our totals here and here to 21 and 34. okay now that we've surveyed way more males than females it's really hard to tell like which males prefer like the males prefer soft cheese over hard cheese or females etc etc but if we use percentages it's going to be way easier to compare so i'm going to redraw this table again but this time i'm going to do something called column percentages all right so let's calculate our column percentages now this only works if you've chosen to do the explanatory variable on the top here otherwise things aren't going to really make a lot of sense so explanatory variable at the top response along here so now we do what's called our column percentages so we do 8 divided by 22 and we put our answer in here as a percentage so 8 divided by 2 times a 100 now that's going to give me 36.36 recurring percent and i can put that in here now how do we interpret that that's the percentage of males total males that prefer hard cheese now i can do the same here 14 divided by 22 times 100 that's going to be 63.63 now when i add those two together i should get a hundred percent right because they're the only two options now if you add them together you'll get 99.99 but that's because of rounding here if you took the exact number and the exact number and added them together you get 100 finally we can do the females hard and soft by doing uh 5 divided by 12 and 7 divided by 12. so when i do that i get 41.67 and 58.33 100 i should note i just realized i stuffed up my rounding here 63.6 4 not 63 and that does mean that those add up to 100 now okay so uh this is useful because now i can say that from the survey i did 36 percent of males prefer hard cheese whereas 41 of females prefer hard cheese and 63 of males prefer soft cheese whereas 58 of females prefer soft cheese now i could show that graphically and i'm going to do it here with a nice little column graph my explanatory variables on the x my cheese preferences or my cheese percentage preference on the y-axis now let's look males prefer hard cheese 36 percent of the time 36 of the time little line about there like that okay um females prefer hard cheese 41 of the time 41 bit higher there obviously you'd be a lot more careful with yours now the rest of it is all soft cheese so because we're going all the way up to 100 we can just draw in the rest of our bar chart here drawing the rest of our bar chart here okay now what does that mean well we better label some stuff up all right and we have a nice little finished graph here these should be straight lines right these should be the same width you know that but i have a legend here that says soft cheese is in pink and hard cheese is in blue and when we look at this graph we can say something like it appears that females prefer hard cheese a little more than males do we can say that it appears that both males and females prefer soft cheese over hard cheese right because they're both more than 50 soft cheese but this really allows us to compare those two categorical variables male and female these two categorical variables soft cheese and hard cheese but we're not limited to two categorical variables we could have asked them about hard cheese soft cheese and grilled cheese maybe or some other three option graph so here's the option here i'm testing year one to six years seven to twelve and university students and i'm asking them about their potato preferences do you prefer boiled potatoes mashed potatoes or do you prefer chips and those are the percentages i gave them i can put them on this graph year one to six years seven to twelve and uni the percentages are here and i'll label that all up a graph that looks something like that now use a ruler please mine's really ugly but you should get the general idea of what i'm doing i've got my legend here i've got my title here and i've put all that information in there all right that's how you use two-way tables to investigate the associations between two categorical data