Transcript for:
Understanding the Chi-Squared Test

Well, hello there. Welcome back. It's, uh, we are on section 15 D in class section 15 D. Do you know what this means? This is our last section. This is your last video with me. Congratulations for making it this far. And I promise you that this video, what you see is the, just exactly what you need to know. No more, no less than to do well on the upcoming final. So section in class activity 15D deals with something called the chi-squared test for independence. So I'm going to review independence and then I'm going to use an example. And it has to do with income level and education level to illustrate what. is useful about the Kaisberg test of independence. And I know you had a little preview activity. I'll make some references to that. And by the end of this video, you should be very strong on what you will be expected to know for the final. Okay, let's get to it. Screen. Okay, so we're going to talk about this example. So we've got a two-way table, and you all, and I'm going to be using old ideas and reviewing, so you will see a two-way table on plenty of times on the upcoming final. The data that we have comes from Pew Research Center. So an excellent research center. If you ever have to do any papers in your other classes, Pew is an excellent resource displaying two categorical variables. So one of the variables is income level and they have it listed as under 30,000, 30,000. I'm going to call that poor. And then we have $30,000 to $74,000, well, almost $75,000. Now, you might not, living in Santa Barbara, it's going to be rough to have a medium standard of living with that salary. But it is over nationally, it is middle class. And then in the same vein, up $75,000 and up, today I'm going to class that as upper class. So, and this is based on income level. So the other variable is education level. So we've got a two-way table just like before. So there are two people, for example, there are two people out of this sample who have a postgraduate degree. So that's a degree beyond college and they make less than $30,000 a year. So two people out of how many out of a grand total. So this is the grand total. I think I'll pick a different color. Grand total. So out of a grand total of 1,286, there's only two people who have beyond a college degree, a four-year college degree, who are poor, basically. So the question is, make a conjecture. Do you think income level and education level are independent? Explain. So before you jump into that, let's review what independent means. So independence, I want the basic, most conceptual idea of independence, not the mathematical formula that you know to prove independence in a probability question. So, and I love to do things by example. So I'm going to have an example here. So we have sex at birth. and height as an adult. Okay. So one of these is an example of independent characteristics. And one of them is an example of dependent or related. So independence means not related. And I think when we're dealing with this, it gets tricky because of the double negatives that sometimes happen. So independent means, is it means related or unrelated? Means unrelated. And I'm going to say, or no association. So I know you guys are going to get this right if I say no association. That is the more user-friendly term for independence. So we're learning the chi-squared test of independence. We could call that the chi-squared test of no association. So you're testing to see if variables have an association or not. So which one of these characteristics bundle the two pairs? are independent. Is it sex at birth and eye color, or is it sex at birth and height as adults? If I tell you that somebody is six foot five, do you have a pretty good idea of their gender, their sex at birth? You do, right? Men tend to be taller. Women tend to be shorter than men. Not all women are short and not all men are tall. But that is the trend. So these two variables here are completely related. So they are not independent. And that's where the double negative gets kicked in. So I'm going to say no association. I'm going to, oops, I almost got it wrong there. I'm going to say association equals not independent. Okay. And so this one here, is there an association? If I tell you the sex of someone born, when they're born, can you predict what their eye color is? No, you can't. We're all men, women, it's all, we're all mixed up in terms of eye color. So these two over here have no association. which is the same as independent. So your homework is going to treat these interchangeably. And so when we're learning the chi-squared test of independence, we're also doing the chi-squared test of no association. And that's going to be your null hypothesis, that there is no association. That's going to be your baseline belief. So, and I'm pretty sure, well, anyway, here, so we've got this data set that was collected by Pew, which is an excellent research institution. And do you think that there is, do you think there, do you think that income level? and education level are independent, meaning there's no association, yes or no. So, and you can explain there is no wrong answer here because it's your opinion. But once we look at the data, there's going to be a pretty strong conclusion. So I'm a teacher, I'm a community college teacher. Of course, I think that education is linked. to income. So I think there is an association and that's kind of what drives why I get up every day and want to do more teaching because I want to get as many people on the happy side of that line as possible. So that's me. Bronwyn believes there is an association. between the two variables, between income and education. Therefore, she believes the two characteristics. are not independent okay um okay so that's my belief so we're gonna go and um this is so your your class i'm required to cover high score tests of independence there are several other other chi-squared tests, homogeneity. Well, there's a whole bunch of them and you, this is the only one we need to cover. That's why we're jumping into 15D. So because of that, we're skipping around a little bit. So I'm going to fill in some gaps as we go along. They were mentioned in the, in the preview, but I'm going to just, I'm going to really hit it hard on what you need to know for the, for the final. So, um, After this is over, you're going to understand the conditions and assumptions required for the chi-squared test of independence. So just to review, because I'm sure that's on your mind, for the one-sample proportion, two-sample proportion, two-sample... mean and one sample mean, which were the t-tests as opposed to the z-tests. The conditions we needed were that the samples are random and independent and that the sample sizes were big enough. And if you recall, big enough changed depending on whether we were talking about proportions, np and n1 minus p are greater than 10 or equal to 10. for the means it was n was greater than or equal to 30 if the data didn't come from an originally distributed normal distribution. So we're going to have the whole thing about sample size and I'm going to write out exactly what it is and we're going to have a thing about whether the samples are random. All right, there are limits to what we can conclude. And that was similar with ANOVA. ANOVA, there were some limits about what we could conclude. If we rejected H naught, all we could conclude is that one of them's different. We couldn't say which average was the different one. We'd have to follow it up. So this is going to be very similar to that. So limits to what can be concluded, and we'll definitely go over that. We're going to use our beloved technology to perform the chi-square test of independence. And we're going to interpret the conclusion in context. And we're going to, we are not going to interpret. So this one, cross that one out. We don't need to do that one. And I've already deleted the questions that had anything to do with that. All right. So we're going to conduct a chi-square test of independence for the variables on income level and education level. What are the null and the alternate hypotheses? So this was gone over in the preview, but it wasn't gone over. We haven't talked about it. So I'm going to give you the generic null and alternate hypothesis. So in general, for chi-squared test, H naught. and HA are very similar. So it is the chi-square test of independence. So it's going to be that the two variables are independent. The two variables are independent. Independent. I should know how to spell that. And then the H-A is they're not independent, which is really, you know, it's a little bit of a, the two variables are not independent. And when I think that not independent, what does that mean? That means they're related. So I'm just going to say that. The two variables have an association. So if I could rule the world, I wouldn't say it's a chi-squared test of independence, because I know independence sometimes gets muddled in people's heads. I would say instead, and I will accept this, and I think it's easier. on the brain, H naught, there is no association between the two variables. That is the baseline belief. That's the null hypothesis. And by the way, this is actually a test of proportions, because if there's no association, then the proportions of people who have one characteristic versus another will all be equal. So it is still your friendly old H naught is a statement about equality. All of the proportions are equal. But this right here, this one or this one. is the standby works every time. If I say chi-squared, this is the first, the null hypothesis, regardless of the context, and you won't actually be required to state the context. H-A is there is an association between the two variables. And I recommend the blue rather than the green. People tend to not get as confused. And all you need to do, you just fill in the blank if I ask for in context. So did I ask for in context in this one? I didn't ask for in context. So this is perfect. If I said in context, you would say there's no association between education. level and income level. And the alternate would be, there is an association between the education level and income level. So there we go. And I'm going to put a black box around this because what you see is what you get. This is exactly, there's no other possibility on what the Nolan alternate. hypothesis could be. So you are guaranteed points in your pocket if you've listened to this video and you've studied these notes. All right, check whether or not the conditions are met to perform the chi-square test of independence. So you first need to know what the conditions are before you can check them. So let's make sure that we know exactly what the conditions are. And there are three conditions. One. One, two, three. Three conditions. And they sound really familiar. The first condition is, oh, this one actually doesn't sound so familiar. This is not an official condition, but to use this type of test, the test of independence, it's important that the data represent, the data represent. represent the counts of two. So it's got to be categorical variables. A nova is for numerical or quantitative and chi-squared is for categorical. So they have to be and they have to come measured for individuals in one sample. So what we mean by that is, do you see this table right here? We didn't ask one group of people what their income level was and another group of people what their education is. We asked one group of people both those questions. So they are one sample getting two different questions. And that's what that's saying right here is this is not an official condition. But to use this type, it has to be categorical and it has to come from one sample. And that sample should represent. the so was this was this condition met? Yes. Yes. Income level. Now, you could say, wait a minute, those those sure look like numbers to me. Well, I'm going to go ahead and say this is poor. This is middle class and this is rich or wealthy. So. I've turned them into categorical variables and that's what we're thinking about. And then the other ones here are obviously categories. Okay, so yes, the variables are categorical. And the questions were both asked to the same people. We're not comparing two different populations. We're comparing two different responses within one population. So it's met because it's that. is what it is. I've just explained it. All right. The next condition, so we're doing conditions here, is, and this, see, this is where independence is a confusing term. So let's just ignore that. I'm going to cross it out. Is the sample independent random sample, or is it an independent sample that is considered representative of the population? That is really what the, that's really the gist of it. Is it a sample that can be considered representative of the population? So what we're asking is, is this good data? Was it collected responsibly? So, and I'm just going to focus on the random condition. So, we can assume the sample is random and well represents our population of interest, which is If you go back and look, we're interested in American adults. So we're going to assume that Pew well and well represents all American adults. Why do we assume this? Why? Do we know this? Because Q says so. Now, if Fox News said so, I don't know if I would actually accept that as I would be, I would say, I don't know if that condition has been met. Because Fox News is not actually, a lot of what they present is not news, it's opinions. And they've been shown to not have representative samples, certainly of all of the U.S. And probably the same can be said for, you know. some left leaning. It's Mother Jones, I'm sure she probably that article that that resource might have some problematic samplings as well. But because this is Pew, it's a well balanced, well respected think tank. So I'm going to go with it. All right. The last condition is, oh, it's our friendly old favorite is the sample. size large enough. The sample size must be large enough so that the expected count in each cell is at least five. So that is the condition met. So we've got some things to unpack here. First of all, What does expected value mean? So in the context of proportions, the expected value is n times p. It's the number in your sample times the proportion that fit that description. So if we know that... that's that 10% of people are left-handed and we have 50 people, we're going to expect that five of them are left-handed 50 times 0.1 is five. So that's how we would calculate the expected value. But in this context, the expected value, expected value of a cell. Okay, and I'm going to go through some examples of this. Is equal to the row subtotal times the column subtotal divided by the grand total. What the heck is that? It's a formula. And I know I just gave it to you without a lot of explanation, but this is one where I think it's better for us to look at what is generated and then talk about what it is. Let's do a few of those. So we're going to look at our, so let's look at the first one. So of our cell. So let's look at this one. Two is sitting in a cell. Now let's see what two stands for. Two stands for there are two people in the whole sample. And we've got, here's our grand total. Here's our grand total. It's down here. Okay, so we have a sample of almost 3,000 people. And we asked each one of them, what's your income level? What's your education level? And only two people had high education level and low income level. Two people out of all of those. So let's compute the expected value right in the cell, okay? So we know, so this is the expected value for two. I'm going to say E2 equals, so two lives in this row, lives in this row, and it lives in this column, right? So the row total is 56. And the... column total is 425. And we're going to divide it by the grand total of 1,286. And I'm going to want you to round to two places past the decimal. So when you work that out, what you're going to get And I'm going to put it right here. when I work it out, I get 18.51. So what that's telling me is if there isn't an association between income level and education level, then we would expect 18, 19 people to be in that cell. We only got two. So our expected value is right here. Oops, what I wanted to do. Our expected value is right here. Okay, for that little two sitting in that cell. So what the expected value tells you, tells you what. you should expect if H naught is true. H naught being there's no association. And now let's, if you want an explanation for why this actually works, the column the column divided by the grand is a percentage. So if I look here, that's the percent of people. If you just look at out of 425 people, that divided by that is the proportion of people who are poor. And then you multiply it by 56 and you get the actual multiply the proportion times the raw number and you get the expected value. So it really is NP all over again. So if you wanted an explanation of the formula, there it was. But I want. So is that cell value more than five people? Yes. So the restriction here. So we have how to compute an expected value. I'm going to reserve green for that. But for a condition, the condition, the sample is large enough. If every expected value is at least five. So what we have to do is we have to check it for every single value. We did just one. So I'm going to ask you to do. two more. We're not going to do them all yet, but I want you to do, um, on your own. I want you to do this one. Actually, maybe we'll do that one together. And then after we're done with that one, I'm going to want you to do this one. Okay. So let's do. It's kind of like playing bingo. So I'm going to do the expected value of the gray one. So it's the expected value of 138 is going to be the, if you look, it's the row subtotal. Bam. So that is. I'm going blind, 389 times the column total, which is this one right here. So what column does it live in? 420 divided by the grand total. And the grand total is always the grand total. So it's 1, 2, 8, 6. So if you run that through. what you get. I'm going to jam it in this little box. is 120.5 if I did the right hope I'm showing you the right one yeah 120.51 okay so that one passed the test too we'd have to do every cell though blah so this is pretty tedious if you don't use technology so on your own I'd like you to do I'd like you to work out this one right here So calculate the expected value of eight, which is in the lowest corner. It's in this cell right here. So pause and calculate it. Okay, so what did I get? So the expected value of eight is equal to 118 times 441 divided. by that same grand total and that equals 40.47. Okay, I want you to look at that. We're testing to see if there is a relationship. Assuming there is no relationship, we should expect this right here. And I want to write it in words here because this is really powerful to me. We would, I'm going to say, if H naught is true. And what does H naught say? No association. You don't need to go to college. There's no association. You don't need to study hard. You don't need to pick a major that suits you. You can just drop out because there's no association. That's what H-naught says. If H-naught is true, we would expect to find almost 41, 40 people out of the sample. One, two, eight, six. to be, so this, I'm looking here, and this is this column, and this degree. So they have those two characteristics. So to be wealthy, and with no and, it's an and because it's cell value, and no high school. degree. We would expect to find 40 people out of that total sample if there's really no relation, if everything is proportionally equal. How many are there really? Eight. There's eight people. So those other that are supposed to be in that category are not. They're not in the wealthy category. So that alone is causing me to think that there's a relationship, that there's an association, that they are not independent characteristics because the eight is way too small compared to the 40. Okay. So that's what expected value tells you. And I guess you could go through and work out every single expected value. And that is actually what they want you to do. So the sample is large enough if every expected value is at least five. Um, we know this condition is met. Because Ron went and said so. And didn't want to do the tedious calculations for... Every cell. Trust me. And this is the, what do you think we're going to do to get those expected values for real? Are we going to do it by hand? Are we going to use a calculator? No. What are we going to use? Technology. Oh, there it is. Technology to the rescue. So just to review, there are three conditions. One, this, you shouldn't apply this test unless you have categorical questions. to be answered, not quantitative, and the all coming from one group as opposed to two groups. The data was collected well, they're random samples, and there isn't bias, and you did a good job. And the last one is the sample size is big enough, meaning every cell's expected value will be five. It doesn't mean that the observation itself has to be five. Notice for this one here, there's only two people who are poor and have high levels of education. And that's, but the expected value was 18, oops, was 18.51. So we're good. We're good. That met that threshold. We'd have to show it. for every single, we'd have to get one, we'd have to, all of these have to be bigger than five. And if we were in the 1950s, we'd have to actually calculate it, but we're not. And if we were in the 1950s, I probably wouldn't be your math teacher because I got, I did well in math because I had a calculator. Okay. So we're going to go and do this test of independence. And we're going to go to our lovely, beloved, wonderful free tool that I hope you use in the future. And we're going to follow these steps. So here goes. want that little deer just happen there. Okay, so make this bigger so we can see it. And I think I'm not going to mess around. I'm just going to start typing because sometimes it doesn't like apologize for this around. I'm left handed. The world is not made for left handed people. Okay. So where did this come from? Let me close that out. Okay. Close that out. Let's pretend. So here's at this point, all of these should be familiar to you. We, we, we didn't cover the F distribution is the ANOVA one. And the we did just a brief treatment of, but we, and the Poisson we haven't done, but you can read that on your own. it would be fun um so confidence uh inter uh so inference inference comparing several groups so we're not actually comparing several groups we're just comparing one group with several questions so tap on chi squared open it up okay And you're given this data here. I know it looks a little messy, but it's in a two-way table. What's another word for it? So it's definitely not the textbook. So I'm going to click on there. And you want to make sure that you select test of independence. You want to make sure that you're doing this one right here and not this one. Goodness of Fit is another high score test that's not required, so I'm not teaching it. So not our textbook. So our choices are individual observations or contingency table. So individual observations would be if you had a data set and you want to cut and paste it. We actually have the contingency table. Two-way table, contingency table, it means the same thing. So we're going to click on there. And so none of these names mean anything. So I'm going to, the row. And if you read some of the data center stuff, they may say row and column, it doesn't matter. It matters to me because I may say third row, second column, to ask to see that you know how to compute expected values. So don't get your rows and columns mixed up. So row is, these are the rows. So here's one row. No, that's a column. Here's a row. Here's a row. Here's a row. So they're all education level. So we're going to throw education level in the first one. Please do this with me so that you're getting the benefit. And then the education is going to be, so this is really a lot of education. So post post grad. So those are your master's, your PhDs, maybe some technical degrees being. But then there's college, which is B.A. Then there's some college. I mean, they started college, but didn't finish. Then there's high school. school, didn't do any associate, nothing. And then there's no high school. So you didn't, that's nothing, nothing at all. Okay. So the column variable name, and I understand if it might be confusing what you, what we mean by column, but it's like a column support something. So it might like the white house, I think has some columns in front. So those are the columns. And they're all, if the 30,000, this and this, they're all income level. And just to really make sure that it's clear that income level is categorical, I'm going to say, I'm going to use the terms that I put up here. So income level is going to be. And you can see they're making a nice little table for you as you go along. So I'm deleting this. I don't know what they're going to do. Category one, category three. They're like, what the hell? So what the heck? So poor, the first column are the poor people. The second column are the middle class. And the third column is the wealthy. I know we're not really wealthy. people who make 75,000 are middle class for sure. But the way it's set up and nationally speaking, this makes sense. So notice we're kind of ignoring the totals. That's not a part of this. So you don't need to calculate an expected value of the subtotals and the total because they're what's used to actually calculate the expected value. Okay, so then you got to throw in all the data. Did I not separate by commas? I think I did. Okay, maybe it doesn't like, I'll just say middle instead of middle class. Okay, so interesting. It didn't like the two word category. Did you guys see that? So this is why you need to do this as well so you don't get undone on the final, on the, yeah. So now, pretty confident in that. And sometimes we just get glitched out. So I'm going to, I'm having my iPads having a little crisis. So I'm going to pause for a minute to see what, so enter your data in yours and I'll meet you back here when my data is entered too. Okay, so I got my iPad was glitching out a little bit there, but it's good now. So I got the data from here. And notice, I know that if we look at this other one here, we were focused on expected values, right? We were focused on finding the expected values. That's not what you're doing here. You're just putting the actual raw data in. So oranges for observation. So I put, I entered those. by hand into the contingency table in the Davis Center. So I enter all those, and I hope I did it right. We'll see in a minute because I've got, so there it is. And look what it gives you a lot of information. So you're going to need to know what information is relevant. And this is hypothesis testing. There is no association. There is an association. H not H A. So you have an idea, you get a test statistic, which is a standardized score. And I've thrown it all in there. Oh, but I'm asking a question here. You notice, what are the degrees of freedom? I do want you to know what the degrees of freedom are. Degrees of freedom for a T test is just N minus one. So, but for this test, the degrees of freedom are going to be r minus 1 times n minus, times, sorry, times c minus 1. r is row and c is column. Row. columns. I think that's how you spell it. So how many rows are there? I'll do red for row. How many rows are there? One, two, three, four. 5. Don't count the subtotals. So that's 5 minus 1 times, and then how many columns are there? Let's do, I don't know, blue for column. I don't know why, but 1, 2, 3. 3 minus 1. So that's going to be 4 times 2. which is eight. Okay. So we calculated the degrees of freedom. And now I know the degrees of freedom and the T test and minus one is a little easier to calculate it. It's rating how reliable your, your results are and how much data you have, how complicated it is. So here we asked four questions. There were four possible answers for how, what, what's your education level. And there were three possible answers. So it's rating how granular your information is. And if you look over here, there's the eight done for you. So a lot of this is really quite tedious, but it's all done for you in the technology. But I do want you to know where it comes from. So what is the value of the chi-squared test statistic? Okay, so can you find it? Right here. You see that right there. So you go to it's called Pearson's chi squared test. And there it is right there. So we've got 285.58. I'm just going to make sure I enter my data in because I did this before. Yes. So it's 285.58. So just write exactly what you see there. So there is the chi-square test statistic. And I just, while I've got the spotlight going, because it's plugged into my typing mode, we'll get to B in a minute, but I'd like you to go right straight away to what's the p-value. And do you see the p-value there? It says it's less than zero. 0.0001. Wow. So I'm just going to, I'm just going to write exactly what I see there, less than, so my p-value is less than 0.0001. So if I were asking you to round to three places past the decimal, you would have to say zero, which you're saying it's, if H naught is true, you would never see this kind of data. Well, we see the data, we collected the data, we trust Q, there's the data, the data exists. So we're going to not keep believing our old outdated belief. Anybody tells you, oh, you know, going to college is so outdated. And you with all that college debt, blah, blah, blah, they're lying to you. The truth is, we're, we could be on the verge of another recession. And whenever we've been in recessions, the people who suffer disproportionately are the people who are already poor. And as you can see, the poor people are the people with less education. So I'm getting ahead of myself, though. So I've got my degrees of freedom. I have my test statistic. I do want you to know conceptually what that test statistic represents. So, um, so before we keep going with our hypothesis test, uh, I do want to talk about this test statistic. So what it is, what the, uh, and this is very scary notation, but I hope by now. you're beginning to realize that higher education is full of scary notation, which might actually be quite basic concepts. And it's full of fancy language that might actually be just really strict. Like, uh, well, um, I can't even think of a confounding variable, lurking variable, uh, just variable. I mean, you, if somebody explains the language to you, it becomes the parent. Once, once you hear some explanations. So anyway, that's, so I'm going to just break this apart a little bit. So how many cells do you see up here? I see one, two, three. So, and then it goes four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15. There are 15 cells. So this means this summation. means all 15. So the first one, let's just look at the first cell. It's going to be the observed value minus the expected value. And then what you do is you square it and then you divide by the expected value. And you did a little bit of this in the preview. That's just the first cell. Bam, we got that one. Move on to the second cell. So it's going to be plus the observed value, observed value, eight minus the expected value. Oh, damn, I don't have the expected value. I sure wish I did have the expected values. I wonder if technology can give me that. Wouldn't that be awesome? Let's see if they can. Oh, look here. If you look. Right here, I'm going to check this box. And did you see that a new column appeared right here? So the expected value for the next one is 18. It did all that work for me. 18.3. The next one is 19.2. Do you see that? So that 19.2. I don't have to do the work. I know that it's going to be column subtotal 56 or sorry, row subtotal times column subtotal 441 divided by grand total 1286. But they did it for me. And I'm going to ask you on the exam, I'm going to ask you on the final, I'm telling you right now, I'll ask you to compute. the expected value of a cell. And I'll give you the cell that I want you to compute. You'll see this on the homework as well. And I'm going to ask you to round to two places past the decimal instead of one so that I know that you actually computed the value. I might even do three places past. So watch out for that. But you'll have the answer staring you right in the face. So this one, for example, is going to be... 39. That's the second one, the expected value 117. Okay, so you get the idea. So moving back here. For that second piece of the chi-squared test statistic, it's going to be 8 minus 18.3. So it'll be 8 minus 18.3, and then you square it. And then you divide by 18.3. That's the second one. And then plus, you have to keep going. The third one will be the observed value, 46 minus the expected value, which is 19.2 squared all over. the expected value. And now I'm going to do something that your poor parents couldn't do. I'm going to go dot, dot, dot. And the very last one, it's going to be, I feel like I'm going blind here. It's going to be eight minus 40.47 squared. all over 40.47. And so what you're doing is you're measuring each one of these. This is the gap between what you saw and what you'd expect to see if H naught is true. Then you square it to make sure that you're not getting the positives and negatives to cancel each other out. And then you're dividing by the expected. That's a way of standardizing it. It's very similar to observation minus center divided by spread. The only difference is there's a square in the middle so that things don't get canceled out. And you're doing it for every cell. You're measuring the gap between the observed and the expected, the observed and the expected, the observed and the expected. And then you add it all together and you get this number. Where is it? 285.58. And this is your test statistic. So it doesn't look like the test statistics you've seen before, but it is a way of measuring how likely your null hypothesis is to be true. And so here's your test statistic. The bigger your test statistic, the more of a gap there is between what you would expect to see and what you actually see. And that's a pretty big fat number. You can see how big it is by looking at, you can see it by looking at this distribution here. Here's where most of your test statistics would be. Here's your observation. It's very far. Zero means, oh, I got exactly what I would expect if H naught is true. So the P value is telling you what the P value tells you what the picture tells you, which is, wow, that test statistic is really far away from what we would expect. So, so that was background, but you don't need to know all of that. And This is the last video. Most of our test statistics have been T and Z scores. On the final, most of your test statistics will be Z and T scores. What does Z teach you? How far from the center, how far from the hypothesized expected truth. T score does the same thing. And now this is just a twist on that. Why do we use it? Because it works. It's a good way of measuring. how surprising your data is. That's what you want to do all the time is measure how surprising your data is. So small p-value, do you keep or reject H naught with that kind of a p-value? So p-value compared to alpha. And I think they must have given you an alpha level. If they don't give you an alpha level, assume it's 5%. 0.05, 0.0001, but it's actually less than that. But I think we've got a picture here. So is our p-value relative to our risk level, our significance level? Is our p-value small or large? It's small. Small p-value means surprising data. And you can't get more surprising than this really saying it's not going to happen. So what you do instead of rejecting the data, which you know you have, you reject H naught. So it looks like I've repeated myself in here. Oh, dear. So same question. Okay. And I have the explanation. So we'll just end that there. Sorry about that. And then E, what's your conclusion? What conclusion do you draw? Well, you have a template. There is or is not evidence. to support and there's ha and you want to have the population of interest in us so go check your template you know the template for conclusion um so is this evidence so and this is always about h a so Is there evidence to support HA? Oh my gosh, there's so much evidence. So there is overwhelming evidence to support HA. What does HA say? It's all, it's right here. There is an association between the two variables. There is an association. Between. And what were the two variables? Because now we got to do it in context. Oh, association between, oh, between the two variables. Okay. So notice that part E, I didn't ask for it in context. So now part F, I do want it in context. So you need to mention. the two variables and you need to mention the population of interest. So there is overwhelming, you didn't have to put that in, support or evidence, evidence to support. that there is an association between education level and income level for who? Maybe not for, I mean, like if you go to a country where there's, where wealth is just inherited all the time, a hundred percent, like if you, maybe if you go to Saudi Arabia, maybe the people there who are super wealthy, they don't have to go to school. Maybe, I don't know. I think they do go to school too, though. But for, we're talking specifically about Americans. And was it adults? American. Who's the population of interest? American adults. That's who our study was about for American adults. So I described exactly what my parameters of interest were, and I described my population. So I'm good. I got that. Okay. Based on the results alone, based on the results, I think we're done with this one. Let's get some more room there. Based on the results of this test alone, can you assure someone that if they pursue more education, they will have larger income? Explain. And there is a hint there. Since we have concluded that the education level and income level are not independent now, so they're sticking with the not independent. I think it's easier to do. no association, association. So association means not independent. I still, I have to think about that myself. So I'm going with that one. So since we have concluded that education level and income are, I'm just going to make this more user-friendly, associated. We have concluded, oh, that there's an association between them in some way. But do we know what that association is? Does the alternate hypothesis go out on a limb and say that become more educated and you're going to become wealthier? No. And so it just says that there is a relationship. That's as far as it goes. So no, no. All we know is there is an association or they are not independent, but we don't know. If one increase means the other increases as well. We don't know the direction. There's not a less than or greater than just, it's just that something, something there. Okay. Oh my gosh, this is our last question. So we concluded that our hypothesis test, that the variables, the two variables, income level and education level are not independent. They are associated, but we do not know how they are associated. It could be that there's a third variable not included in our study that impacts the value of both of the variables we're considering. Such a variable is called a lurking variable. So we've called it a confounding variable in the past. or confounding. Give an example of a lurking variable that could arise when considering association of these two variables. So a lurking or confounding variable has to be associated with both the variables. So we've got education. income. So we're saying we can maybe predict if we know a little bit more, what's something else that might be feeding both of those? Turns out, I mean that I am going to go out on a limb that education, wealthier people, what could be something that explains that besides education causing the wealth? There's lots of answers here. One of the answers that I think is family support, family support. If you come from a family where they're all educated, then they're going to be more supportive of you when you're going through college. And that's, they're going to understand that. Sitting all day in class and listening to a professor is as exhausting as digging a ditch. And that way you come home from work. I mean, when you come home from school, you're exhausted and you need a break. And it's not like you were just relaxing. We know college is hard work. So we know that, especially if we've gone through it ourselves. So those of you. who are maybe going through college for the first time in your generation, you're breaking a cycle. And that's, that's amazing. That's great. But it does make it that you deserve more support, because you might have less support at home, you might have overwhelming support at home. But I think that family support is a lurking variable where it, it get it might give you a leg up on education, and it might also give you a leg up on income. When my husband, we bought at the top of the market and we would have lost our house if we hadn't had, this was years ago. I was in, you know, we had low level jobs and we would have lost our house if we hadn't had family to help us through the bad times. And so family support just kind of lifts everything. But you can find that in other ways, as long as you can get over and say, I'm going to get help other ways. So family support is one way. I think wealth itself. If you're wealthy or you're more likely to go to school, even nowadays because of financial aid. But there's lots of scholarships out there. So there's always ways to overcome the obstacles. What other working variables are there? I mean, going to school can be considered a luxury. And well, anyway. came up with two of them, family support and wealth. Okay. That was, those were my answers. So let's see how we did. What are the conditions and assumptions for a chi-square test? One, categorical variables, and both come from the same sample. Okay, that's one. The data is good. It was randomly selected and it represents the population. And three, the sample size is large enough. And large enough is expected value is at least five in every cell. Every one. Okay, there will be a question on that. So those are the conditions. So we took care of that one. Bam. What are the limits? We just went over that. If you reject H naught, you just show that there is an association. You don't explain what kind of association it is. That will require more tests. Do you know how to use technology to perform the chi-square test of independence? I just showed you it. So if you have trouble on the homework, you've got this video to come back and look at. Interpret the conclusion. Um, so we've got H naught and H a are always this, this works right here. This works right here. Um, you can take it to the bank that that's what the two is. So that that's what the two, um, hypotheses are going to be. Um, and so, um, interpreting them, you just fill in the blanks with the exact variables and. conclusion you would throw in the population of interest. Okay, so we are done. Congratulations, you guys. It was lovely being your teacher, and I look forward to grading your finals. I look forward to seeing you in study sessions between now and the final exam. All right, and do the homework. We do still have, you're done with your video, but you have one more homework assignment to do, plus the highly recommended but not required. final exam, practice, final practice exam, practice final exam. Okay. Bye you guys.