hello everybody this is dr. Alvarez this is lecture seven comparison of means in sociology 303 statistics today we're going to be looking at the specific form of byberry analysis known as comparison amines which literally just means to compare the averages of different groups that's what we're going to be doing today I'm going to show you how to describe and interpret that I'm going to show you how to test hypotheses using comparative means and we're also going to talk about look at the first application of a dummy variable and actually here let me just put this here so we're going to introduce comparator means we're gonna look at a dummy variable example what's a dummy variable again a dummy variable is a dichotomous variable that's coded as zero and one so yes or no or agree/disagree actually anything that's dichotomous you know did you do did you vote in the last election yes no okay that's one and zero we can do it an ounce of it I'm gonna show you why it's important I'm going to show you why it's important and then we'll do tests of significance for comparison it means and then we'll look through some examples we'll look through some examples I hope that sounds exciting to you because this is what we're going to do this is about 30-some slide so this is going to be on the longer side so I don't know prepare you know I put some section markers in here so that you can take breaks tour and certain portions obviously you can pause it whenever you want so I guess I don't need to worry about that but just want to let you know it's gonna be a little bit longer of a lecture okay let's get going comparison of means this is used when you have a interval ratio dependent variable when the dependent variable is a number an actual number not a category but a number right the amount of income you earn last year in dollars the total number of minutes it takes you to commute to campus everyday or the total number of minutes it takes you to find parking on campus right any number right rate your evaluation form ax 0 from zero to 100 your feelings towards the current President of the United States with zero being absolutely the test where and 100 being absolutely love right something like that whatever you want to do that's an independent variable right your independent variable can be either nominal or ordinal nominal or no meaning categorical now you can also use this for dummy and dichotomous all rather more specifically dummy dependent variables dependent variables I'm going to show you how I will make a one quick caveat is that sometimes sometimes we treat ordinal variables as interval ratio variables sometimes we treat ordinal variables as if they are interval ratio variable so when will we do something like that if you get a chance going to GSS 2014 and there's a variable in there called wealth and look at the distribution of that variable well it looks like it looks like and or no variable and ordinal variable because is you know zero to one thousand dollars of wealth one thousand to five thousand 5025 etc etc right it looks like a ordinal categorical variable something however it goes up to it has like 15 categories or something like that and so sometimes we might treat it as if it is an interval ratio variable I'm gonna try to avoid that in this class simply because that it can be a little bit confusing but I do want to point this out because in real life statistics you will see some people treat or know variables as if they are interval ratio but we're going to focus on using comparison of means on only two types of dependent variables and if you want to write this down with a little star next to it that we will use compare means when our dependent variable is either an interval ratio variable or a dummy variable and then our independent variable is nominal or or no boom that's the thing that I want you to know from this slide okay we can just represent this here with our table by Berry analysis you should be use of this by now I've shown it to you a bunch of times I've highlighted sort of in this right I don't know when you call that salmon I said salmon where am I I'm bougie I said salmon oh sorry no jokes okay red pink whatever heck that is when our dependent variable is interval ratio and we have a nominal or no independent variable we have we use comparison of means comparison means does everybody see that all right good when we do a comparison it means we get the mean for each category of your independent variable we get the mean which category of your independent variable and like any other statistic we calculate in this class you will describe and interpret describe and interpret that statistic are we clear about that just like everything else you have to describe and interpret it what I'm going to run for you right now the next couple examples that we're going to look at are the compare sort of mean - just looking at the analysis we're not going to look at the inferential test we'll do that next okay so the first thing we'll do is look at the analysis and then we'll look at the inferential test the test of significance and then we'll put those things together and we'll run through some examples okay that's what we're going to do that's our plan let's do this thing so comparison of means education and Health the dependent variable here is the number of days felt healthy and full of energy literally that's just a number of days that you feel healthy and floating and do be Jack's I don't know from 0 to 30 how many how many of those days do you feel healthy and full of energy it's a number right what's the independent variable your highest degree earned right of those categories yes they are we can see the categories right here and a researcher might say there they the research hypothesis would be there's a positive relationship between education in health right so what do we know that means that we believe that this researcher believes that education will have a positive relationship on health and education as it increases what will happen to health that it will increase right that education is a will have a causal impact on health and it will class education Rises this will cause health to rise as well that's the underlying assumption here I have not written it out here but what's the implied null hypothesis and we can always know this right we always know this without being explicit about it's just based upon what we know about the null hypothesis the null hypothesis is that there is no relationship between education and health right there is no relationship between education and health or that education has no significant effect on health right all of those are the null hypotheses right here what just thinking about the research hypothesis because we're just looking at the analysis though we're just looking at the analysis and let's run through what we would do so what does this table tell us it tells us for every category for every category of Education what was the mean number the mean number a base felt healthy and full of energy do you see that we can get some additional information the median the range the standard deviation we have also the sample size for each one of these categories why some of this stuff matters will become clear in some future lectures so what we we describe this we would simply tell the reader what we think what we see here for those who have less than a high school degree they averaged 18 point two days feeling healthy and full of energy you know last month for those a high school degree 19.2 eight days junior college degree 20 point two four days bachelor's degree 20 point seventy sixty degree days graduate degree twenty-one point nine three days right where does that look like what what might your interpretation be there does seem to be a positive relationship between these two variables doesn't it what is happening as education is increasing what is happening as education is increasing are the mean number of days going up are they going down are they remaining the same they appear to be going up don't they 18 19 20 21 21 again almost 22 do you see that so there does seem to be a positive relationship between education between education and number of days feeling healthy and feeling of energy how which is how we're measuring healthier just so we're clear if there was no relationship between these two variables just looking just by looking at this what do you think that that would look like what do you think that would look like there's actually two examples of this actually there's two examples of this what could it look like the first is if they generally all had the same number right so if all of these were just 19 if it was 19 19 19 19 19 that would imply that there's no relationship between education and health because there's no differences between based on educational categories in terms of the amount of days feeling healthy and full of energy they have does that make sense secondly if there was no clear pattern here right if there was no clear pot if this was 18 and this was 29 and this was 16 and in this of 20 and then this was you know 17 again if it just bounced around there was no clear pattern that would be another time when we would suggest that there's not a real clear relationship between education and health is that what we see here no we see a very very clear pattern here we see a very clear pattern here right so again write this down when you describe a comparison means you're giving each category and the mean associated with that category when we interpret it that means that we're trying to come to some determination about what we think of it in terms of is there a relationship between these two variables here's something that you definitely want to write down here's something that you definitely want to write down we are not using the same rules that we might use for say crosstabs where we're looking for a 10 a 10 no percent different these are not even percentages right it is not based upon that that's not the the measure that we use yes just like cross ABS we're looking for a clear pattern but you have to use your judgment to see or to determine for your own for your on your own judgment whether or not say for instance a change of going from 18 to 19 from 19 to 20 to 20 to 21 and almost 22 and does that actually represent a big difference here do is going from 18 to almost 22 is that a big difference I don't know my own personal opinion about this is to give to you you know generally speaking there are about thirty days in a month right thirty days in a month right so if we go from the lowest category to the highest category we're increasing it by about almost four days that's about ten per eight 10% 10% difference between the bottom and from the Bayeux time in terms of the maximum number it could possibly be is that a big difference I'm not so sure to me and looks like it's a pretty weak relationship a pretty weak positive relationship there is a clear relationship and it looks sort of weak to me because a difference of 18 days versus 22 days from less than high school all the way up to a graduate degree that doesn't really seem to be that big of a difference do you understand what I'm saying how did I come to that conclusion I use my own judgment I use my own judgment that's what you're gonna have to do that's what you're gonna have to do then you have to describe to me why you're there justification fire you came back now I will tell you though I'm not always gonna be looking for you to say weak or strong definitely you know you need to come to an interpretation of it right and you could say you know there's only a four day difference between the lowest education and the highest education right and that's helpful for the reader right it is usually helpful to say strong a week but I'm not going to force you to do that I'm just encouraging you to put as much information in your interpretation to use your judgment to empower you to use your judgment about what you think is going on here are we clear about this this is how we describe and interpret describe and interpret a comparison of means okay I'm gonna go to another example okay and here we're going to use a different type of example we're going to use a dummy variable example we've talked about dummy variables before I said it's a type of variable that you need to know and be able to identify right there is a variable in the GSS that ask you how frequently do you lend to friends and family and it runs from you no more than once a month once a month once a week more than once a week etc etc etc I have recoded that variable into a very simple question in the last month did you lend money and in the last month did you not lend money normally in a dummy variable when we construct it in this way we code the thing that we are interested in the thing that we are interested in as one and the thing that we are nitrogen as zero if I'm interested lending I Koji and you then I code anybody who lands as one anybody who did not land I code a zero makes sense to everybody right so what percentage did not lend what percentage gets coded as a zero fifty five point two percent what percentage did lend 44.8% makes sense we clear so far this is really useful for us because it turns out that when you code something at zero and one when you code something at zero one and ask some interesting mathematical properties okay we're going to use an example and I'm gonna force you to write this down in your notes right let's say we have three people we have three people you know in this room they're actually no people or cats in this room currently but let's imagine they were right and I asked each of them if they lent money to a friend or family member in the last 12 months one person person one says yes write that down person two says no person three says no you got that written down no so that means we have a yes a no and a no directly to the right of that for the yes write a 1 for the know what are we gonna write zero right for that so we have yes use 1 then we have no for person to they are a zero person 3 is a no there is zero right so let's imagine we want to take the average of this variable the average the mean of this variable well we didn't now normally if we have a categorical variable we can't take the mean of it right it doesn't make a whole lot of sense in this case for a dummy variable we can how will we do that well we add up the numbers 1 plus 0 zero equals what one right if I divide that by 3 so 1 divided by 3 what do I get I give point 3 3 3 3 so write that down right 1 plus 0 plus 0 equals 1 divided by the number of people my sample that's 3 so I have 1 divided by 3 equals point 3 3 3 3 right now you got this number that's in there I want you to sit back for a second and think about this let's pretend we didn't do any math at all and I just simply asked you hey what percentage of people are in your sample let money to somebody to a friend or family member in the last 12 months well why not have 3 did right well what percentage of that what percentage of your sample is that it's 33% isn't it it's 33% now if we compare that number 33 percent to the average that we got when we did that math where do we get well we got point 3 3 we turn that into a percentage what do we get 33% do you see that this is the property one of the interesting mathematical properties have a dummy variables that if we take the average of a dummy variable we get the percentage of people who answered 1 on that dummy variable if I were you I would write that down if we take the mean or average of a dummy variable the mean or average of a dummy variable we get the percentage of people who answered 1 on that variable know what that number one means depends on the variable that we're looking at right in our case what would it be the percentage of people it would be the if you lent money to friends or family let's look at this in action here we're gonna look at lending and gender and this particular person has a hypothesis that women will lend more than men right women will lend more than men here I'm using the lending dummy variable right and so here just so we're clear what's the dependent variable and independent variable the independent variable is gender or sex and the dependent variable is lending this researcher believes that you're no gender identification will have a causal effect on your lending more specifically that if you are a woman you will be more likely to lend right so then we can actually test that hypothesis right here we're looking at a comparison of means using the lending dummy variable and sex as the independent variable what percentage of men let let money to friends or family in the last 12 months what percentage can you see it it's point four six eight four meeting forty six point eight percent of men met money to a friend or family member in the last twelve months how about for women what percentage of women took out a payday loan almost forty three percent of women took out a payday loan do you see that almost forty three percent of women to get a payday loan how would I describe and interpret it I would say looking at this table right looking at the comparison of means between sex and lending it shows that for roughly forty seven percent of men I say that's I said let me get these variables right again I'm sorry my cat was doing something crazy out in the foyer and so I got I got distracted 47% of men lent money to a friend or family member in the last 12 months 43 percent of females a women lent money to a friend or family member in the last 12 months that's my description so males men were more likely to lend money to a friend to friends and family by around four percent four percentage points right four percentage points there is not support for this research hypothesis is that when you come to evaluate the research hypothesis right because men actually lend money to friends and family more frequently than than than do women right I'm giving the thumbs-up to my partner who just uh quieted the cap for me she's the best does everybody see what we just did all right let's do another example shall we let's do another example let's use lending to friends and family here this person believes that whites will end more than other groups right the independent variable is race the dependent variables lending money to friends and family lending money to friends and family in the last 12 months right what's the null hypothesis here just so we're clear that race will have no effect on lending right the Raceway I have no effect on lending so let's describe this table looking at a comparison means between lending money to friends and family in the last 12 months in race we see that 42 percent of whites lent money to friends and family last 12 months 58 percent of blacks let money to friends and family in the last 12 months and 45 percent of those who are categorized as other in the last 12 months that's just dumb race parable from the the GSS so historical artifact it's a long story but it's really dumb but we're using a hammer for simplicity's sake so we just described it when interpretation will become too well but looking at this we will come to the conclusion that blacks are more are much more likely than whites to have lent money to friends and family by about 16 almost 16 percentage points right and those who are classified as other are also more likely to Lane versus whites by about almost three percentage points right so whites actually have the lowest level of lending to friends and family do you see that so here is their support for this research hypothesis there's not right there's not so when we describe and interpret we'd tell the reader what percentage of each category you know did the thing right or what the mean of each category is and then we tell them about what those differences look like and within how you and how they're meaningful right do you see what I'm saying that's how we describe and interpret a comparisons a comparison of means right it compares it means I hope you wrote that down so that's make it a little trickier let's add in our test of significance we're going to focus on two things here a t-test and an anova t-test and ANOVA okay you ready now it might be a good time for you to take a drink of water you know a sip of your beer let's do this thing so what tests inference your tests are associated with campaigners other means yeah we say it for crosstabs oh it's chi-square right for crosstabs it was chi-square we can compare summer means we use two different ones we use two teeth ananova a t-test if we're using a t-test if we compare means of just two groups males versus females or even if you have a independent variable that has many many different categories let's say you're using your religious affiliation right or the the country in which somebody was born right that could have many many many different categories but if you're only comparing two of those categories you would use a t-test right so if we wanted to compare Catholics and evangelicals right even though our are even though are you know religious categories could include you know Muslim Jewish if you can compare Kat again a Catholic you get evangelicals Pentecostals Baptist Episcopalians we could have all these numerous categories but if we're only comparing two of those categories for only comparing two in that case we use a t-test we use ANOVA if we can pour Kippur compare more than two groups right more than two groups more than two groups right and so when we use the ANOVA we get an F test and then we also use something that's called a bond firown' ii post hoc test a bonferroni post hoc test a bonferroni Potok test this is an additional test that gives you more information for you to describe and interpret for your readers and it's very useful for us people sometimes forget about it so i like to call it the Beefaroni test cause it's an easy mnemonic device you know what beaver Onias they might know before if you are a working class or a lower middle class individual you probably know what this is chef boyardee is pasta noodles in this movie and it's meat sauce I haven't had it in quite some time but when I was growing up I had it all the time I used make it for myself you know when I was a teenager or when I was hell even younger than that right you opened up two cans of Beefaroni you put them together you add in a slice of American cheese once it gets nice and hot you mix that all together you got yourself a big super unhealthy super class or off-field super process super fattening a meal you're welcome for that image if you haven't had chef boyardee before I encourage you to walk out to your local rouse or whatever it is Piggly Wiggly if that's your thing if you're in the south whatever it is that you can go to go buy a can of your Ferrari and heat it up and try it it will put an unhealthy smile on your face but that pasta goodness is how I like to think of the bonferroni it's a good reminder that the Innova uses an f-test and then you also have to get this good pasta goodness to give you additional information about that test right so let's be serious for a second what are you taking from this slide there are two different inferential tests we use for comparisons of means and I would write this down if you're only comparing two means you use a t-test if you're comparing two means White's versus blacks right males versus females upper-class versus working class right you use a t-test if you're comparing more than two categories we use an F test and a bonferroni excuse me a Beefaroni an F test and a Beefaroni why we also you have to use the Beefaroni test will become obvious to you shortly I promise you okay so if you're like why are we doing this why are we talking about this I'm going to I'm going to illustrate this to you in just one second let's start looking at some examples but just also to remind you here to go back to a table you've seen before this is our types of bio barrier we were looking at the analysis here we were looking at comparisons of means now we're focusing here right now we're focusing on the tests of significance that are associated with these comparison means we clear that's where we're going now about to show you what the t-test looks like and so prepare yourself we're ready let's go here's a comparison means and here we've gone back on that we would I've gone back to look at doing a test of significance for whether or not men or women are more likely to lend to friends and family members in the last 12 months right so what happens when you run a t-test the top portion is the actual comparison of means that you've seen before right so he gives you that comparison that means right let's let's group statistics right so when you run in SPSS the t-test it will give you the original comparison or means do you see that that's very useful for us because we can describe and interpret what we see there right describe and interpret 40 almost 47 percent of males went to a friend or family member in the last 12 months 43 percent of females than to a family member in the last 12 months therefore males are more likely to lend to a friend or family member by about what three percentage points right boom describe and interpret describe and interpret interpret but then we have to go down to this independent samples t-test in order to determine if this is a statistically significant difference and this has two steps and I encouraged you to write this down because it's going to be in your homework and it's going to be on the exam and you're going to need to be able to do this Oh are we ready step one you need to determine if we want to read the top row or bottom row for the output how do we do this by looking at Levine's test for equality of variances specifically we are looking at the test of significance for the Equality of variances what is this doing technically is looking at the variance in each in males and females right it's comparing the variance inside of them to see if they're equal or not I'm not going to ask you that that question the thing that I want you to know is that you have to be able to look at this significance in the Equality of variances you have to look at the significance and be able to figure out are you gonna read the top row or bottom row how do you determine that if this significance is at 0.05 or above you will read the top line at the top run the top line do you understand you know what I'm saying if it's below 0.05 then you read the bottom line the bottom line that is all that I want you to know nice that is all that I want you to know the first step in your inferential test is to determine which row you are going to interpret as you read through the rest of this output you do that by looking at the Equality of variances more specifically the p-value for it please significance and if it's 0.05 or above you read the top line if it's point O if it's below 0.05 you read the bottom line are we clear about this that's all you need to know so then we go to part two the second step you ready the second step what we do is we go to the significance right in the second half of the output and we look for the p-value it will always be in this significance to this column right here this column right here the only question would be will you read the top runner will you read the bottom one in this particular instance does it matter it sure doesn't however you do need to know whether or not to read the top one on the bottom one in this instance we're reading the bottom one because this p-value over here is below 0.05 so we read the bottom row what's the p-value that we obtain from this t-test 0.169 what's our rule if we're at the 95% confidence level what's our rule if we're at the 95% confidence though that we're looking for a p-value that's what below 0.05 is this p-value below 0.05 no it's above it point 1 6 9 is greater than 0.05 right so is this a statistically significant difference that you would expect to find in the population no this is not a statistic significant difference that you would expect to find in the population so what would we do with respect to the null hypothesis we would fail to reject the null hypothesis right we would fail to reject the null hypothesis this is not a statistic significant difference we would expect to find in the population so again let me start from the top write this down make sure you you have it in your notes when we run the t-test it's going to give us the info the means for each group that we have to describe and interpret it also produces for us the independent samples t-test this is the actual t-test so after you describe and interpret the means you then have to first look at the equality of variances to determine which row of the t-test you're going to interpret if it's at point O five and above you will read the top row if it's below 0.05 you read the bottom row step two we look at the p-value for the t-test and determine if it is a citizen Z significant finding whether or not we will reject or fail to reject the null hypothesis that's how it works and if this helps let me just say see if I can draw on this for you for a second so this is this is my terrible drawing this is step one is this step two is using this information right you're welcome for my fine drawing skills you're welcome for that this is why I make this medium bucks cuz I can draw things like this right step one figure out which one you read step to actually look at the the actually look at the p-value and determine if it is a statistically significant finding now why did I actually circle all of this because technically speaking where I would like for you to tell me about the t-test when you're looking at the t-test is the T value one point three seventy six the degrees of freedom to DF one two three three point five five two and The Associated p-value point one six nine that's actually how you would say that the t-test that we get has a value of one point three seventy six without 1233 degrees of freedom and an Associated p-value of 0.16 nine point one six nine is above is above you know the critical value for the ninth percent confidence level of 2005 therefore there is not a statistic significant relationship between sex and lending money therefore we fail to reject we fail to reject the null hypothesis that's why I did it the way that I did okay let me erase this junk real fast oh it's like magic and then let's keep going okay you ready for the ANOVA example are you ready for the ANOVA example let's do it okay here I've done the racial comparison right because we're there we might compare whites and blacks and others why we might compare whites blacks and others right notice something when I run the ANOVA if I just run the ANOVA by itself it does not give me descriptives it does not tell me what the means are for each group it only gives me just this just this right here we're again looking at the lending variable right it just gives me this this ANOVA section right here which is an F test you see that in F test are we clear so far so what do we do we look at the F test and determine if there's a statistically significant difference let me be clear and write this down right now the ANOVA results by itself only tell us is there a statistically significant difference amongst any of our categories amongst any of our categories it does not tell us which categories are statistically significant different from each other in other words it doesn't tell us if whites and blacks it is missing it significantly different on lending or whites and others or blacks ANOVA doesn't tell us any of that it just tells us over all is there a difference in this particular case is there a statistic in effing and difference between any of the categories yes here's the significance it's point zero zero zero does that mean that the value the p-value is zero no it means very small number right it means very small number it's very small number less than the Alpha level we need the critical value that we need to determine statistical significance at the 95% confidence level it is that value is 0.05 right very small number is smaller than 0.05 that means there is a statistically significant difference amongst these categories that we would expect to see in the population so therefore we reject the null hypothesis we reject the null hypothesis we just don't know though which of those differences actually matter and which ones might be statistically significant because the ANOVA doesn't tell that you want to make a guess will we get that from huh you might make a guess it's from our yummy delicious pasta goodness the Beefaroni the Beefaroni test that we will also run when we run the anova and i'm gonna show you how to do that different video I'll show you how to do that the Beefaroni test that we run with it will show us that additional information will show us that additional information are we clear are we clear it will show us our different additional information this is the Beefaroni this down here is what the Beefaroni test looks like let's go through it together step by step okay you ready for that remember there were three groups whites blacks and others three groups whites blacks and others right we care about all three we're not comparing just two of them we care about all three in this particular example so we ran an ANOVA with the Beefaroni what does the Beefaroni test do it looks at the race of the respondent here I in this particular example what we really mean is it looks at the first category and it compares it to the other two categories I and J I and J it compares White's versus blacks it gives you the size of the difference between the two groups and then it does a test of significance for those groups are whites and blacks statistically significant significantly different in their lending they are they are how do we know that because that p-value is very small it's less than point zero five so there is a statistic significant difference between whites and blacks in our sample which one lends more in this particular case we can use the mean difference to help us figure that out White's minus blacks have a difference of minus point two seven in other words there's a negative difference white is smaller than black right so then when we subtract those two things we get a negative number do you see that do you see that and it is significant difference how about we compare whites to others whites to others it is a negative difference meaning that other is bigger than whites right so other lend more than whites right is that different statistic significant what's the p-value that we get there that's a point seven three five is that greater than or less than point zero five it's much greater than right it's much greater than right where does that indicate to us that indicates to us that indicates to us that there is no such a significant difference between whites and others is there a difference yes others lend more than white is that different statistic significant and then we were expect to find in the population no it is not no it is not so then the beef aroni test goes to the next group the next category down blacks and it compares blacks and whites and blacks and others do you see that so black - white right gives us now it's a positive number right because now because we know that blacks were big you know that meant more so their average was higher than whites so therefore it's a positive number is it the exact same number it is right but now because the larger number is you know we're subtracting the smaller number from the larger number it's positive right is it statistic significant yes it's the exact same thing that we saw up here right point zero zero zero what's new here black versus other right black versus other how big is that difference point two oh right is that difference statistics and if Achatz do blacks lend more than others they do is that difference statistically significant it is how do we know the p-value is point zero one five is that greater than or less than point zero five it's less than point zero five so therefore blacks lend money to friends and family more frequently or more than others do and that difference is something that we would see expect to see in the population make sense then we get down to the final category other it does other versus white did we see that before yeah we saw white versus other when we did white versus other it was negative point in six nine point zero six nine eight six when it's other - white its point zero six nine eight six same number just different different sign right p-value exactly the same right p-value exactly the same because it's the same comparison right do you see that natsot is a significant other versus black right have we done that before cuz yes it's exact same as black versus other 0.2 six one nine when we put other first now it's negative point two zero six one nine p-value is exactly the same let's stop for a minute and make a couple points you ready to write some stuff down I'm gonna start from the first thing write one the ANOVA does not give us the actual means the ANOVA does not actually give us the actual means this is number one notice it doesn't there's no means up here no descriptive statistics I'm gonna show you how to get those descriptive statistics okay I'm gonna show you how to get those descriptive statistics in another video because it's sometimes useful I just didn't put it here cuz I wanted to show you what the Enola results look like with the b4 on e on their own so number one when you run the ANOVA even with the Beefaroni it doesn't tell you what the actual means are number two the when we run the ANOVA it just gives us the ANOVA it just gives us the ANOVA with no other information just the ANOVA which says is there any statistics significant difference between any of your categories it's just a general test one test here's the important part of that when we evaluate our null hypothesis we will specifically use the ANOVA we will specifically use the ANOVA to evaluate the null hypothesis in this particular case write the p-value is point zero zero zero this is less than point zero five therefore there is a statistic sniffing in difference between the categories that we expect to find the population and we were therefore reject the null hypothesis we will use the ANOVA to evaluate or in other words test the null hypothesis number three we use the Beefaroni test the bonferroni test to figure out which one of those differences which differences amongst the categories are statistically significant you must describe and interpret the information in the Beefaroni test that's a lot of pasta goodness there that we want to you know make sure that we describe and interpret for our reader right and so you'll go through and try to figure out which one of those differences are statistically significant makes sense and we did that why we laid out that the white versus black difference as a p-value point zero zero zero that is statistically significant that we expect to find in the population we also saw that the black other difference is statistically significant and we expect to find that in the population right however the difference between whites and others is not statistically significant makes sense you will need to know how to do this number four this is not down anywhere this is not written down anywhere and as a secondary point this is one of those emails that you can send to me to say hey I got this point dr. Alvarez and I write number four why don't we use the ANOVA write this down why are we using it over in the first place every test of significance has some potential for you to be wrong if we're at the 95% confidence interval what's the chance that we're going to be wrong with thee what's the amount of the probability of an error that we've said is acceptable about five percent right if we're at the ninety nine percent confidence interval what's the percent chance we're gonna be wrong it's about one percent throughout the 90 percent confidence interval what's the percentage chance we're gonna be wrong it's ten percent right every time we do an additional significance test we are adding on additional probability of being wrong do you see what I'm saying every time we do an additional comparison there's an additional possibility that we could be wrong right the white versus black comparison five percent chance of being wrong white versus other comparison five percent chance of being wrong black versus other comparison another five percent chance of being wrong right that's all additional chances of being wrong so additional tests of compare test of significance actually leads us to the situation where we are incurring additional probability of error that's a bad thing we don't want that that's the usefulness of the ANOVA the ANOVA is one one test for some difference some difference somewhere in those comparisons and so we just incur a one five percent chance of being wrong and we run that ANOVA that's the good thing about ANOVA it minimizes the chances that we are going to be wrong in our test of significance that is why we use it to evaluate our null hypothesis it's because it minimizes our chance of being wrong instead of going into these multiple multiple comparisons in our Beefaroni right do you see what I'm saying email me this point do your best job of trying to articulate this point and email it to me and you will impress me okay and I will appreciate it does that make sense to you does that make sense to you the ANOVA is just one test instead of multiple tests right we use that to evaluate our null hypothesis then we go to our Beefaroni analysis and there we look for some of the differences that might be statistically significant but we're not evaluating our null hypotheses based on those right or not right we're not doing that we're just describing those results so that our readers understand what we found there right make sense good very very very very good so let's put it all together and start doing some comparison of means and putting it together with some test of hypotheses right so we're going to have a research hypothesis we're gonna have a null hypothesis we're gonna look at some T test results and then we're gonna describe and interpret it and test our hypotheses right we're going to get some examples if I were you I would pay close attention to these and use these as models for your homework and for your exams okay so we clear another good time to have a sip of water another good time for you to have a drink of whatever you're drinking maybe coffee I guess because I don't know maybe you need to stay awake maybe some tea there's some real benefits health benefits of tea okay t-test example a researcher is interested in the impact of children on Christmas spending her research hypothesis is households with a child under 18 will plan on spending more on Christmas gifts than those who do not have a child 18 in the household right so here what is the independent variable and what is the dependent variable I'm really bad at that I gotta stop that my my wife describes me as um you you know some people are tone-deaf they can't hear different tones she says I'm tone mute did I actually can't produce different tones so I just demonstrated for that you all demonstrated that to you all so you're welcome for that the independent variable here is whether or not there's a child under 18 in the household the deep and available is planned spending on Christmas gifts right measured in dollars here so what's what's the claim here that having a child under 18 in the household will have a causal effect on your plan spending on Christmas and that you was planning on spending more right you were planning on spending more right so what does this researchers do in order to test this the researcher runs a compare sort of means because the dependent variable is interval/ratio planned spending in dollars on Christmas gifts and the independent variable is categorical she uses a t-test because she's comparing two groups households you with a child under 18 in households without a child under 18 make sense make sense good let's keep going this is the result this is the result of that right what do we see what do we see well child under 18 if a household does not have a child under 18 their planned spending on Christmas gifts is six hundred seventy three dollars and twenty four cents if a child if I also does have a child under 18 their planned their planned spending on Christmas gift is eight hundred and thirty six dollars and forty-five cents does that make sense that's a difference of about what's around but a hundred and about a hundred and sixty three dollars so those who those who have a child under age 18 do plan on spending about a hundred and sixty three dollars more on Christmas gifts than those who do not have a child under 18 does that make sense now when you're describing interrupt this you can you know make some little differences you could say you think that that difference is big or not right you know in my particular instance I you know in terms of the skills I want you to develop at least get into the habit of saying how big that difference is right what's the actual value all right no hundred sixty three dollars you know that's that's that's enough that's a fair amount of money I'm just you see three dollars right you can also put the sample sizes in there you could say there were three thousand seven households that were that did not have a challenge under 18 and there were about a thousand sixty-six that did you could add that information right that's all stuff that's actually good stuff I'm trying to you know making it a little streamline for you but you could add that additional information right so the first thing we just did was we described and interpreted right what do we do next what do we do next next we have to go down to the test for the Equality of variances to determine if we're going to read the top row or the bottom row how do we do that we look at the p value here the p value is point zero to six that's less than 0.05 so we read the bottom row right we read the bottom row so what do we say here we would say the t-test produces a T value of minus two point one four four with one thousand one hundred eighty five point six degrees of freedom and an Associated p-value of point zero three two point zero three two right where does that tell us that there is a statistically significant difference between households with a child under 18 or she's child without a child under 18 in their Christmas plan Christmas spending that we would expect to find in the population we therefore reject the null hypothesis we therefore reject nono hypothesis right now if we go back for a second right we said the null hypothesis there's no relationship between having a child on 18 in the household and spin Chris spending on Christmas gifts we rejected that null hypothesis is there a support for the research hypothesis again what was the research hypothesis households with a child under 18 were planning on spending more on Christmas yet than those who do not have a child under 18 a sword is that what we found yes so we also have to evaluate the research hypothesis we did see that households with a child under 18 we're planning on spending about a hundred and sixty three dollars more on Christmas gifts and those without our 18 so therefore there is support for the researchers hypothesis makes sense let's write it up would it look like households with a child under 18 plan on spending 836 dollars and 45 cents and those without planning on spending six hundred seventy three dollars and twenty four cents this is a difference of 100 you see three dollars and 21 cents households with children under 18 or plan to spend more Christmas gifts than those without children that's what we said right notice I added some of those do additional things there let's now evaluate our hypotheses the independent samples t-test indicates that the variances are not equal does we read the bottom room right this shows a T value of minus two point one four four without in seventeen point six degrees of freedom an Associated p-value point zero three two this is significant at the 95% confidence level and thus we can reject the null hypothesis of no difference or no effect in addition we can say that there is support for her research hypothesis households with children are average more on planning Christmas spending David those without boom you are done that's the worst way it sounds like what shooter McGavin would say when he says boom right that sounds bad shiron yes that is a happy gilmore reference you are welcome for that you're welcome for that does that make sense to everybody I ran through an example this is the some example language that you can use I would have this slide handy and bearable when you are doing your home or right so that you get used to writing this because I'm going to ask you to do test by policies t-test in your you know in your homer as well you ready for an ANOVA example you ready for an ANOVA example another good time for some water or beverage of your choice let's do it another one a researcher is interested in racial differences in Christmas spending she believes that Latinos will spend more on Christmas gifts than were other racial groups and here we actually have a good race variable we have a good race variable whites blacks Asians and others I have no idea why I capitalized Asians there I don't know why I did that [Music] research hypothesis Latinos will spend more on average than other racial groups on Christmas gifts what so why are we running in an anova because we're actually comparing latinos to whites and to blacks and to agents and to others if my hypothesis was just that latinos will spend more on Christmas gifts versus whites then it wouldn't be an ANOVA anymore it would just be a t-test right latinos versus whites or vas a latinos versus blacks you'll be Latinos versus blacks right does that make sense to you when we're comparing it to multiple comparisons to multiple groups more than two then we use the ANOVA if we're just going to that Pinos versus asians then we do a t-test right so here we're using an ANOVA the research hypothesis Latinos will spend more on average than other racial groups aren't Christmas gifts null hypothesis there is no relationship between race ethnicity and Christmas spending or another way of saying is that we're that Latinos will spend the same as other racial groups that's the same way that's the same exact thing right research null hypothesis here I also want to make something clear to you right and I would write this down if I were you oftentimes in research the null hypothesis is not explicitly saying they just have a research hypothesis and the null hypothesis is implied right and sometimes I even I will write this write it that way your job is to know that the null hypothesis is there and that you're you you also will have to test the null hypothesis evaluate the null hypothesis and evaluate the research hypothesis even if they don't stay what the null hypothesis is right even if they don't stay with the null hypothesis does that mean that you if you don't see the null hypothesis you have to write it out well on an exam no unless I ask you to write what you do know one thing yeah if you look at the tests of significance and the p-value is less than point zero five then you reject the null hypothesis if it's above point O five then you fail to reject regardless right so you know that even if the null hypothesis is not significant so make sure you know that so what are we gonna do because the view pin in variables interval/ratio this is dollar spin on Christmas gifts and the independent variable is categorical we use comparison of means by the way I'm using data here that's real data from a very large nationally representative data and people do actually collect data on this kind of stuff and if you were a business major or marketing major this is an example of something that you might do I as an economic sociologist would also do something like this because I'm also interested in spending patterns amongst different groups I usually don't use something just as as blunt as race because you know I'm normally interested in finer details than that but still this is like a real analysis that you would do and this is real nasty representative data I didn't make it up this is real good data nasty represented data with very large sample sizes because we are comparing the means of more than two groups Latinos Asians white black etc we use ANOVA with the Beefaroni post hoc tests are are we ready are we ready this time by the way I'm going to run the ANOVA I'm gonna run the ANOVA with my descriptive statistics with my descriptive statistics so this time I can actually look at and describe and interpret the descriptive differences the mean differences by groups right rather than just focus on the ANOVA and you know what I mean in just one second here we go here we go here are basic descriptive right here are the racial groups that we have in this bear in this data set that we have in this data set here's the sample sizes for them all they are the good stuff you see that how do I describe and interpret that those with two plus races there are non-hispanic planning on spending six hundred and seventy seven dollars on Christmas gifts those who are black non-hispanic planned on spending seven hundred and seven dollars on Christmas gifts those who are Hispanic are planning on spending six hundred and twenty eight dollars on Christmas gifts those who are other non-hispanic or playing on spending six hundred and eighty six dollars in Christmas gifts those who are white non-hispanic are planning on spending seven hundred and twenty nine dollars on Christmas gifts there is no clear pattern here however Hispanics are planning on spending less than some groups blacks whites actually almost all groups Hispanics are spending less than all there is a clear pattern here actually Hispanics are planning on spending less than every other racial group all Christmas gifts do you see that does everybody see that boom pretty easy right then I can turn to the ANOVA right then I can turn to the ANOVA the ANOVA is just a test well in that any of those differences are statistically significant and where do we see here where do we see the ANOVA the p-value associated with this ANOVA is 0.7 7 3 therefore there are no statistics nificant differences in the in the between these categories that we would expect this find in the population and therefore we fail to reject the null hypothesis race has no effect on spending on Christmas Christmas gifts are we done there no we would also want to go and look at our Beefaroni results looks complicated now because we have more categories to compare right here are here's versus J comparisons right here of the actual difference mean differences in the comparisons here's the significance for all the comparisons and what do you see running down it's one right it's one is one more or less than 0.05 it's way more right so none of these differences are statistically significant none of those differences are statistically significant are we clear and so what would you say that the Beefaroni test show that there are no clear citizen a significant differences there right are we clear about this are we clear about this let's start let's go back and write this all up looking at the comparison of means we see that for those who identify as Latino Latino they they average 624 dollars and 48 cents in plan and Christmas spending this is lower than the average for all the other racial groups and ideally you'd write them all out but you know this is produces fine now let's evaluate our hypotheses evaluate our hypotheses always start with the null hypothesis we can look at the ANOVA results to tell us if these are significant differences between these groups the nura's all shows an F score of point four four nine four degrees of freedom and an Associated p-value point seven seven three this is above the 0.05 critical value used for the 95% confidence level thus we fail to reject the null hypothesis this is there's not a statistically significant difference between these groups that we expect to find the population or there's not a statistically significant relationship between race and Christmas spending that we expect to find in the population same thing research hypothesis looking at the Beefaroni results we see no racial ethnic group comparisons show up there as statistics emfe is significant either given that Latinos are lower than all the other racial ethnic groups we can conclude that there is no support for the research hypothesis right so we're using both the by the Beefaroni results and the descriptive statistics that help us evaluate our research hypothesis you okay makes sense to you we good let's do one more anova one okay let's just do one more ANOVA one all right hey researcher is interested in educational differences in Christmas spending she believes that there will be a positive relationship between education and spending on Christmas thus her reason her research hypothesis is education is positively related to spending on Christmas gifts the null hypothesis then is that there is no relationship between education and spending on Christmas gifts because of the intimate deep and invariable is interval ratio and the dependent the independent variable is categorical we use comparison of means because we are comparing the means of more than two groups less than high school all the way up to PhD we use a nova with the Beefaroni post hoc test we're good with the independent independent variable independent variables education dependent variables christmas spending that means that we believe that there is a causal relationship between education and christmas spending and that as education increases it causes spending to increase on christmas gifts right make sense the alternative is true that you know if education declines we will expect that christmas spending will also decline or more likely amongst those who have lower levels of education we expect to see lower levels of christmas spending let's take a look at what we get here here are descriptive statistics here are descriptive statistics and again we will want to say give these means right those who have less than a high school education averaged about 556 dollars in spending on christmas gifts with those with high school degree or GED and 639 versus certificate or technical degree it's 599 for some college in 652 or including community college for associate degree at 646 for bachelor's degree is 844 for master's degree it's 800 - for professional glee it's 1042 and for doctoral degree it's 1060 two dollars and 19 cents right is there a general trend up birds yeah I think so it starts at 556 it goes up to 639 you still up 599 650 - 646 842 802 1042 1060 - right so you do see a general trend upwards right a general trend out but then as a matter of fact the trend is actually quite power.you that clear it's not a clear trend cuz it bounces around a little bit but it is actually a large increase because it goes from 556 and it almost doubles to 1062 do you see that so there is actually a relatively strong trend upwards right it just bounces around a little bit it bounces around a little bit right is that is do we would we expect to see a relationship between education and Christmas spending in the population overall let's look at our anova results what's the p-value we obtained there's point zero zero zero therefore we do think that there is a said there's the significant relationship between education and Christmas spending that we that we would expect to find in a population so therefore we would reject the null hypothesis of no effect or no no difference right what do we do then we look at our bonferroni excuse me our Beefaroni results right I'll pass the goodness right here I'm just giving you one part of the Beefaroni results because with all these categories that table gets really really really large right here our views bachelor's degree you know compare against everybody else right what do we see here then if we compare those are the bachelor's degree versus everybody else do we find something that is actually too distantly significant no 0.66 2.0 five seven point seven one seven point two five oh one one one but this approaches statistical significance and approaches statistical significance fight however we don't actually see any clear statistically significant results when we do the individual level comparisons that happens sometimes and that's okay we privileged and I would write this down about where you I'll write this down we actually will privilege the ANOVA results over the Beefaroni results we privilege the ANOVA results over the Beefaroni results so we would still 100% reject the null hypothesis of no difference right because we the ANOVA it shows they citizens difference however when we're writing up and talking about our research hypothesis we would bring up the fact that there's not a choose it's a difference but there is there is a general trend overall that looks that is consistent with their positive relationship does that make sense and this is just the fact that this is messy sometimes and that's okay that's okay and actually if you're if you've been paying close attention which you might actually see is that the Beefaroni results if you look at the mean differences don't always match up exactly with what the mean differences are from our descriptive tables and that's okay that's okay the Beefaroni results actually do some slight changes to some of the mean differences and that's fine I'm not gonna ask you down and they honor an exam all of this is simply to say the thing that I want you to take from this is that we privilege the ANOVA results when we're looking at the null hypothesis evaluation and that's possible for the null hypothesis the ANOVA test to be different than the post hoc test that's fine that's perfectly fine and we just roll with it and when we explain that to our readers are we clear we explain that to our readers so that's describing results and interpret looking at the comparison it means we can see that at the lowest level of education last night in school those respondents averaged 557 and planned holiday spending as you move up the categories in education the average journey goes up to over a thousand dollars for those with professional and doctoral degrees this is just a positive relationship between education and plan bending on Christmas test our hypothesis we can look at the ANOVA is also tell if there are significant differences between these groups the ANOVA is all showing F score a point of three point five five one eight degrees of freedom an associated p-value point zero zero zero this is below the point O five critical value used for the 95% confidence level thus we reject the null hypothesis we would expect to see a positive relationship between education and spending on Christmas gifts in the population evaluate our research hypothesis looking at the Beefaroni results we see no educational groups are significantly different from each other at the 95% confidence level despite what the ANOVA says it happens there does seem to be a general upward strain and spending based on the education educational level providing some support there research to the for the research hypothesis however the post hoc test does not show clear significant differences those who might conclude that there is a there is weak evidence for the research hypothesis good are we good so what have we learned with respect to comparison to means comparison amines are used when the dependent variable is either interval/ratio or a dummy variable and the independent variable is categorical either nominal or ordinal level of measurement if the independent variable only has two categories or you only care about two categories and you're going to compare those two categories you will use a t-test if the independent variable has more than two category more than two categories and you're going to compare make more than two comparisons then use ANOVA with the Beefaroni and high tests and as a reminder for hypothesis testing we have our research hypothesis and our null hypothesis we have to evaluate both we evaluate our research research hypothesis using the analysis to cross tabs and the comparator means we evaluate our null hypothesis using the inferential test tests of significance the chi-square t-test F tests this has been a long lecture our comparison means and testing hypothesis using comparator means hope you learned something is kind of interesting have a good one everybody