hi everyone this is meant to show intro stats and today we're looking at analyzing two different categorical data sets so you have two different categorical variables or you've asked people to different categorical questions and you're starting to analyze those two questions and those two categorical variables and how much other how much are they related what are some of the key statistics we can calculate and dealing with two categorical data sets so the one thing the first thing we want to really create is something called a contingency table so a contingency table just summarizes all the counts for the the two categorical variables so in fall 2015 semester we asked our statistics students at our college to questions we asked them what social media do they prefer and and what do they have at least one tattoo or not so this is an interesting kind of funny data and we're looking at so you can see here I have tattoo no tattoo and then these are some of the answers that the students gave Facebook Instagram other snapchat and Twitter all right and by the way this is fall 2015 I'm sure things would change if we redid the study today we'd probably get some different numbers but if we're analyzing this data this is sometimes called a contingency table contingency table it's a summary of a kid that counts for two different categorical variables so when you're looking at a contingency table like this the first thing is don't create this thing by hand you can I mean you can do you know tally marks and try to count it like how many people said had a tattoo and said Instagram in our data there was 41 of them but I wouldn't want to count that by hand most programs can create contingency tables for you in a next video I'll kind of show you how to create this contingency table with software you really once you get into bigger datasets you're not gonna really be creating these things by hand let's got a look at the contingency table a little bit just so we understand the individual cells you want to kind of look at what's a column and a row and by the way these can be switched like if you want to tattoo to be the rows and the social media to be the columns you could I have it set up where the social media is the rows and the tattoo or not is the columns so for example if I just look at one number this number 56 right here notice it's in the node tab to column and the Facebook row right so this would be people that had did not have a tattoo and said they prefer Facebook as their favorite social media there was 56 of them if I pick another number like 11 so 11 here would notice it's in the tattooed column and the snapchat row so this would be 11 people actually have a tattoo and prefer snapchat so both those things are true about these 11 people now notice we have totals if you're going to deal with percentage analysis and remember categorical data is all about proportions and percentages then we need amount divided by total right so you'll see these totals some computer programs like stat keno you'll see them right all here al al instead of total I kind of prefer total but some computer programs you may see different so the totals are the total for that row or column so it's important to know what total is that talking about there's lots of different totals in a contingency table so if I look at one of these like 124 well if you notice that's the this row total so this is basically adding up 41 plus 83 in getting 124 that's the Instagram so there was a total of 124 students that preferred Instagram or if I look right here 241 if you notice that's the the end of the note a two column so this would be the total for all the people that said they did not have at least one tattoo so there was 241 students that did not have a tattoo now what about this number right here in the very very bottom right in the contingency table the very bottom right number is very famous it's called the grand total we actually had a grand total of 326 Statistics students in this data so this is what we call the grand total now if you notice a couple things if you added the row total I mean the column totals 85 and 241 it adds up to 326 okay these two add up to 326 if you added up all the row totals these totals these row totals would also add up to 326 but you can't add the row totals and the column totals that will add up to BB grand total so just that's a few features of a contingency table by the way you will sometimes we heard we hear people refer to a contingency table as a two-way table a two-way table most people in the stat world call it a contingency table so contingency table is a better way to say it but you will hear people sometimes refer to this as a two-way table now tables always have a size the size of the table and that always goes to the number of rows by the number of columns the one thing is you can't count don't count the the descriptions and don't count the totals the totals don't count in terms of how big your table is so it's the number of rows not counting totals by the number of columns not counting totals so if we look at the number of rows we had Facebook Instagram other snapchat Twitter that's five so we had five rows total doesn't count and then in terms of columns we have tattoo and no tattoo again total doesn't count so this would be a five by two table if we saw the real data you'd see that the the categorical data that's asked if you have a tattoo or not is either yes or no there's only two options in that categorical data and when they asked them what social media they preferred there was only five responses so these five responses so it's really that's where this comes from so it's a five by two table if I put the tattoos as the rows in the and the instant and the social media as the columns then it would be a two by five table but this one the way this is set up it's called a five by two table okay all right so let's get right into it so when we're analyzing this kind of data we want to start to look at really out it's all about percentages we've already talked about how categorical data when you analyze categorical data it's all about percentages so there's a bunch of different percentages that we look at when we're looking at two categorical data sets the first one is called a marginal percentage or a marginal probability or marginal proportion these are all words that really all refer to the same idea the one thing about a marginal percentage is that you're really asking a question that involves only one of the two variables you're not really asking something that involves both variables so let's look at an example what percentage of all the students have a tattoo now couple things to look at first of all they're only asking for tattoo they didn't mention anything about social media so that's a kind of a classic sign that you're dealing with a marginal percentage of one variable percentage also the word of is very important in this kind of thing of all the students that means that always refers to what total you're dealing with if you're if you're if you're taking a percentage of all the students then you should be using the grand total as your total since you want to include all the students later we'll see that that will change when we get to conditional percentages and I say something like what percentage of the Twitter students have a tattoo that'll change the total you won't be using the grand total anymore but if it says what percentage of all the students right that were talked about of all the students that were going to be using the grand total as our total now we learned in categorical data analysis that we're always trying to figure out an amount out of the total right so it's really just about finding what's the amount and what's the total but that can be kind of tricky because you're dealing with a lot of different totals and a lot of different amounts so you have to really pay attention to what is the question asking of all the students so I know it's got to be the grand total I'm using for my total but what's the amount well we're looking for tattoos right so what was the amount of people that have a tattoo where would I find that information well tattoo will either be a row or a column in this case it's a column here's the tattooed column and if I go down to the very bottom of that column that's the total for the tattoo so this is the amount of people that said they have at least one tattoo there was 85 of them out of everybody right out of all the students so 85 divided by 326 if you notice both those numbers came in the totals they're not in the regular table there in the totals or what we call the margins they're in the margins and that's where it forgets its name marginal percentage the the amount and the total both came in the margins so it's just kind of where were these here that named marginal percentage now like we learned in categorical data analysis right 85 divided by 326 we're going to go ahead and divide that on our calculator and we'll get zero point two six zero seven three and it just keeps going right now if you remember when we did categorical data analysis we like to round proportions to three decimal places right or the thousandths place so I'm rounding really to this zero here right the third number to the right of the decimal by the way remember the decimal equivalent of a percentage is called the proportion so this again is a proportion don't forget that word that's a really important um word in cat data analysis so here's the proportion and notice I'm going to look to one number to the right of the zero it's a seven so I'm going to round up which means I'm adding 1 to the 0 so I'm going to get approximately 0.26 1 that's my proportion of the students that have a tattoo but the problem asks me for the percentage if the problem just said what proportion of students of all the students have a tattoo I could leave my answer is 0.26 1 but since it again it wanted me to make a percentage I'm going to multiply the answer by a hundred percent remember that's how we said we convert a proportion into a percentage so a point two six one times 100 or move the decimal two places to the right and I get twenty six point one percent your percentage should be rounded to the tenths place in other words and you notice it already is so usually you have one number to the right of the decimal in your percentage so that's kind of a very standard rounding in percentages so we got twenty six point one percent of all the students had a tattoo okay so that's a marginal one variable percentage now let's sort of start getting to percentages that involve both variables because that's where it really the analysis comes in right I want to kind of understand how these work together so one of them that we like to calculate is called a joint probability or a joint percentage or joint proportion again three names for really the same ideas the one we're going to look at first is called the intersecting percentage or the intersecting probability we refer look at the word and is very important in there and usually the word both like a lot of times you're looking you want you want two things to be true about this person or object so what percentage of all the students both