Well, hello there. Today we're going to do our second section on a NOVA analysis of variance, and it is in class activity 14b. And the question we're going to be asking is who studies the most, if anyone. So let me share screen. Right.
So there was a study, you can read the name of the study, it claimed that college students spend 17 hours per week preparing and studying for their classes. So now this is the regular school year, not the summer school year where things are condensed. 17 hours per week, I think that's actually a little low.
And I might think that because I'm a STEM teacher. Statistics is a part of science, technology, engineering, and math. And I'm a little biased, but I think that STEM majors tend to put in more hours. They've got the labs and things like that.
But anyway, I shouldn't be stating my opinion. I should be asking your opinion. So the question is, do you think students with different majors specifically The arts. I'm going to color code this.
We've got specifically the arts and humanities. That's one major. I know it might sound like more. STEM.
Education. and business. Do you think their study habits are the same or do you think they're different? And specifically, we're going to talk about the mean number of hours they spend preparing for classes. And I'm going to highlight that mean number of hours they spend preparing for class each week.
So maybe that's not doing the homework. Maybe that's reading up ahead of time. Not quite sure, but is there a difference?
Explain what you think. So you get to state your own opinion here, but you do need to explain it. You do you.
I know in previous face-to-face classes, people had strong opinions and they believed. So pass students. This is just so for us to know.
Past students believed STEM studied the most prior to class. Then who did they say was next? Business.
Then business. And then not clear, but they believe there was a standout. So not clear which is which.
So I'm just going to, they put that number one was STEM, number two was business. And then the arts and humanities and education majors, they thought, ah, they're probably more or less the same. So those are our preconceived notions.
Could be wrong, could be right. I don't know what yours is, but that's what we have from old classes. So OneWay Inova is going to help us compare two or more means from different populations. So to try and discern if they're the same or they're different. And we're going to be using technology to calculate the test statistics today.
And we're going to interpret those results in context. So that's what we're doing. Use the previous scenario to answer the following questions.
What is the question being posed? So we're going to gather up a whole bunch of majors and we're going to ask them a question. And what is this? I should say, what is the survey question being posed?
So what question are we going to ask each? student. Survey question is, how many hours do you spend preparing?
for classes. And we're not talking about years. So I'm going to say per week, want to be really specific about it. So that's a survey question.
The research question would be a little broader than that. The research question would be, is there a difference in how many hours people study based on their major? So I don't think that's what they were asking.
But if I were on an on a exam and you're not sure what I'm asking and where you're on the you're online then go overboard and just show me what you know so the research question is is there a difference in mean hours spent preparing from the different majors. list them. I'm running out of room because I think they were actually asking for the survey question, but I do want to review for the final. So there we go. So how many groups are there?
Well, arts and humanities, STEM, education, business. Looks like there are four groups. Okay.
Which variable are you comparing between the groups? Be specific. So is the variable the mean?
No, the mean is not the thing that changes. The mean, there's going to be one population average for humanities majors. One, it's not actually going to vary once you're done studying it.
The variable that's changing is the number of hours students prepare for the class. So number of hours, that's going to, everybody's going to have a slightly different answer. Number of hours studying of study time per. week. That answer will be slightly different for every student that I interview.
Now, I probably shouldn't be the one interviewing because that would introduce some bias. That would introduce response bias. The way I'm gathering the data is going to encourage people to inflate their answers.
Okay, so now, so the variable is number of hours. From that, we're going to make a calculation, and the calculation we're going to make are means. And so this is going to be a little tedious.
I'm going to write out the first one and then I want you to pause and write them all out. So mu is mean, mu is mean. So this is going to be the mean number of hours. Bent preparing for class. And I think it's probably classes for art. and humanities majors.
I know that sounds like more than one major. STEM is more than one major. Science, technology, engineering, and math, but we're kind of lumping them into one population. So that was the first one. That was the first group that we identified was humanities and arts.
go ahead and write out the ones for the other majors. And make sure to list it in the order that was listed in the warm-up so that we don't get confused about which groups we're talking about. Okay.
Okay, so I filled mine in. But now that I'm looking at it, I realize that I'm missing something. Is it the mean number of hours spent per year? No, I think it's important that we put that in there.
So I'm just going to jam it in there. Mean hours per week. We'll just jam that in there per week, per week.
So I think. You really want to be quite detailed about this. Okay, so you want to be pretty detailed about it.
You want to make sure that you describe what the mean is. So it's number of hours. You want to make sure to describe for who the, so you want to list the population. Okay. So for number three, analysis of variance, I'm going to ask you now to write the null and alternate hypothesis.
And there's something really nice about this. This will be the same for all ANOVA tests. So you know how sometimes you have to carefully read, is it less than, is it greater than, is it not equal? And that by it, I mean the alternate hypothesis. This isn't true for the ANOVA.
For NOVA, it's very consistent. It is one thing all the time, or two. So the first thing, I'm going to answer it in symbols.
It's going to be H naught, the first average. equals the second average equals the third average equals the fourth average. So these are population averages.
So that's H naught and H A. What's the statement? The weakest statement you can say that will break the H naught because you just, you want to break it. You want to say, no, this is not true.
What's the weakest thing? It's not that they're all not equal. It's that one of them differs from the rest or actually.
One of them just differs from one other one. So, and what your book says is at least there's some different ways you can say that, but at least two of the population means are different. You're not saying which two. You're not saying who's bigger, who's smaller. You're just saying of this whole group, one will not equal another one.
Just leave it so weakest statement possible. So now what we did was we randomly selected, we randomly sampled 12 students from each of the different groups. So we've got our humanities. and art and then we have our stem and then we have education and last we have business. So we randomly selected 12 students from each group so they represent their entire population and we asked them how many hours per week they spent studying and preparing for class.
So here's the data. So this is real data. And I'm going to now switch. I'm going to go to my Dana Center tools and use the thing I love, the thing that makes things kind of fair, even out. So you've had a terrible junior high experience and you don't know how to add fractions.
You can use technology and that's what's real statisticians do as well. So we're going to use technology. These are the steps we're going to do, but I'm going to need to cut and paste this into, so go to here.
Um, and the directions are here. I'm going to, I'm going to expect you to refer to them. Um, Oh, actually let's do it. Let's do this.
Get it started. So, um, it looks like I'm already primed for that, but I'm going to pretend I'm not. So here's the, um, website, the generic website.
interested in multiple averages. So I'm going to go to Innova, click on that. And I think I'm going to transfer to the keyboard.
Sorry, sorry. Here we go. Okay.
So the first thing we're going to do. is open it up. The next thing is enter data, provide your own.
So on enter data here, I'm going to click, I'm going to pick provide my own. Enter the name of the response variable. So it's variable, not the parameter.
So hours studying per week. Okay, now I think I may run into trouble here, actually, now that I remember correctly. Well, let's see. Hours per week.
Choose the number of groups. So it's going to be four. Type the names of the majors. So let's do it in the order. Because if you look at the data.
If you click here and look at the data, it's in the exact order that is listed in your warm-up. So the first is Arts and Humanities. That's spelled right.
The second group is STEM. Science, technology, engineering, and math. The third group is education. Okay. And the last group is business.
Okay. So I got my names in. Choose the number of groups.
type the names. And now we're going to go to this data sheet, click on that, and the data sheet should pop up for you. It won't for me.
So I'm going to do this and hope I don't lose the information. I haven't lost it yet. Cool. We'll see.
So, so far there's no data. And I've got this in a Google sheet that I shared with all of you. And so, so far, so good. So I'm going to.
just go ahead and copy this and paste it. Oh, this is working out nicely. Presto bingo, got my first column. Isn't it nice that you can just copy and paste from a word dot, and this isn't a word actually, this is a Google Sheet.
And you can do this for any term paper. It's free. It's right here. It'll make beautiful graphics for you.
You don't need to buy Excel, even though I think Excel is free right now. So copy, paste. This is education.
Good. Okay. Can't see it, but I know it's all there.
Copy and paste. So I really hope you're doing this with me because it's going to be this kind of activity that helps you. So I don't need this data anymore.
So I'm going to say goodbye to the data. Yay, it's still there. And I'm going to say hello again to my still there. Wonderful.
Okay, so we're back in business. So the next thing. that it wants us to do is look at the descriptive statistics.
Do you think there's a difference? So I better find the descriptive statistics. So I'll just draw my arrow here until I get to it.
And there it is. There's all of the descriptive statistics. And so what, so look at what, I'm looking at just the descriptive statistics. And do you think there's a difference? in the mean number of hours spent studying for each major?
Explain. So first I need to answer yes or no. So I'm going to look at the mean. So here's the mean for humanities. Here's the mean for STEM.
As we suspected, it looks like, or as I suspected, STEM majors have the most studying time, they say, and then education and business. And the irony is that education teachers, people are going to go on to become teachers, have a significantly lower, well, I don't know if it's significantly lower, but I can definitely see that it's lower by two and a half hours. And if we go to business, I thought that the business would be studying more. They seem like a serious crowd to me.
But in college, it looks like they are the smallest. according to the sample, but is the sample, is that very, is that difference in means significant enough? And remember from last time, we have to look at both the, how much the means change from mean to mean to mean. So from group to group to group compared to the change, the variation within each group. I don't think these dot plots are that helpful.
Oh, I have to answer the question. So let's see. I'm going to experiment with this and see if I can type here. Just start typing and hope for the best. So what did I say?
Looking at the descriptive statistics, do you think there's a difference in the mean? So I could say, yeah, I could say yes. And everybody can say a different thing here.
So is this text scribbled? It's going to let me type. I guess not.
Okay. So we're going to get out of that. Oh, maybe it will. No. No.
Okay. I'm abandoning that. All right.
So what do I think? I could say, and there's no wrong answer, some of you are going to look at this and say the gap between 18.5, so looking at the descriptive statistics, do you think the gap between STEM majors 18.5. And who was the lowest?
Business, 15.7 means or averages. is almost three, almost three hours. That's three hours of blood, sweat, and tears studying.
So I think there will be a difference. So I'm looking at the actual gaps between the means. Now, if you're a little more sophisticated than I just was, you might say, well, let's look at the box plots.
So the next thing we want to do is choose box plots. I love looking at the box plots because then you really see the difference between you see the variation within the groups a little bit more easily. So where is that?
Aha. So we're going to. I'll just draw my arrow here and find it. So I'm going to come here and change from dot plot to box plot. And oh, that's so much nicer to look at.
Because now while I'm you could have so this was me not being very sophisticated. So if this is my thoughts, I'm saying I'm going to I think we're going to reject H naught and say there is a difference between at least two of them. And the ones that I'm suspecting are business, which is the lowest, and STEM, which is the highest. But another sophisticated person might go, well, wait a minute. There's variation within the groups that is pretty significant.
Is it possible? Oh, I wish I could draw a line. Um, is it possible to draw a line? So just do this.
Is it possible to, so here, do you see this? Do you see my cursor? If I could draw a line all the way through all of the boxes, maybe that's actually the true population mean for all of the groups.
And maybe the sample mean, well, this is the median. The sample mean is down here. But given that there is some overlap in all of these boxes, Is it possible that the actual populations all have the same name? So that would have been another very good answer for here.
So it's all good. Just state your opinion. We're going to use technology to actually answer it.
So we've chosen our box plots. And now for part A, look at the box plots. What differences and similarities do you see in the groups? So I think it's, I love using box plots to compare. So what do I see?
I'm going to pause and have you write down your thoughts on this, because again, lots of different good answers on this. So don't be unhappy if my answer is different than yours. So what do you, what did you come up with?
I wish I could talk to you about that. To me, it looks like education and business are really similar. So the similarities are, it looks to me, education majors and business majors are very similar, have similar means. means, and spreads.
Now, I do want to really emphasize that the box plots themselves do not tell you the means, but if you go down to the bottom and you see those little triangles, you can see that the blue triangle and the brown triangle are almost on top of each other. So that's a similarity. So I'm answering the similarities right now. Similarity. Any other similarities?
It does look like almost all of the spreads are similar, except for the stem spread. So I'm going to write that down too. All spreads, and what am I looking at? I'm looking at the IQRs.
All spreads seem similar, except for the stem majors. So the STEM majors seem to have the biggest spread of all. You can see that either by looking at the length of the box, which is the IQR, or just the whole box plot from the smallest point to the largest point seems a little more spread out.
Though now that I say that, so does business. Well, actually the ranges all seem the same. So except for the STEM majors who have a higher, a higher, IQR or bigger IQR is what does higher mean? Have a bigger IQR.
So that's, I'm following the directions and I'm just looking at the box plots. I'm not looking at the descriptive statistics. So similarity, I got the similarities and the difference.
I made a note of one difference, which is the STEM majors have a bigger IQR. Are there any other differences? Well, all the medians do seem slightly different.
Also, all medians seem at least slightly different. Okay, so we could be here all day with this question. Just show me that you notice some similarities and some differences, and you'll get full marks on the exam, as long as you don't overlook something glaring.
And I'm sure your answers might be different than mine. Based solely on the visual evidence, do you predict the conclusion? What do you predict the conclusion of the hypothesis will be? Well, I'm gonna, I think it's a toss up here. Those lines are not looking so similar now that I'm looking at, but then the spread, the, the kind of the chaos and the variation inside the boxes are, are pretty big.
Those boxes are spread out. So I don't know. You could go either way. You really could.
Both answers are going to be acceptable. Looking at the medians, I would say they don't seem that different. Or actually, no, I would say they seem different.
But looking at the chaos in the boxes, I would say maybe they're all the same. And that any difference in the centers can be explained by the variation of all of the data. So it's kind of like...
the the chaos in the fact that there's lots of variation inside the groups washes out any difference there is in the actual measures of centers so i don't know you do you here both work but you write you make an opinion don't just say that uh you could get full marks either way okay so now on to the test statistic you So the test statistic is going to have a letter and it's going to have a number. And I'm going to ask you for that on the exam. And it's going to have a p-value associated with it. So where do you find the test statistic? You're going to find that down here.
Okay, and if you move over, you have, and if you remember that the test statistic for ANOVA is a ratio comparing the difference in the means divided by the difference in the variation of all the data. So it's going, the letter is an F, and you can see that. in the top column there.
So it's an F test statistic. This is new. And the number that you actually get is 3.76. So that is the test statistic. And I'm reading it right off the chart and I'll give you a chart on the exam and you just need to know where to look as well.
And then the p-value, if your eyes move along, you'll see that the p-value is the probability of getting that observation or observations more extreme is right there. So I hope you can see that looking in the chart and going across. So that's the F-test statistic. And just by the way, I didn't ask you to write this down, but that's going to be this. is the measure of the mean square.
I forget what we called it because this is the measure of the difference of the means. And then this down below is the overall error. And basically, how big are your boxes? But the best answer is right there. Okay, so there is the F test statistic.
And there's the p-value. And that was all that was asked of you. But I want to kind of emphasize how that ratio happens.
Okay, using the significance test of 5%, alpha. So this is your alpha. Right here is 5%.
What's your decision? So I want you to write it out for me. Go through the process, compare p-value to alpha, then make a decision.
And then, yeah, just the decision. And then we'll do the conclusion in the next one. So I hope you did that. I'm going to do it now.
So my p-value compared to my alpha, alpha is given to you. You usually don't have a lot of control in what alpha is. So they're saying you can be wrong 5% of the time. Now notice my p-value is given to me as a decimal, not as a percent. So since that's a decimal, I better make sure my alpha is also a decimal.
So 5% converts to 0.5. Okay, zero in front of the other one, so that we're just really comparing apples to apples. So who's bigger? the purple decimal or the gray decimal?
I would rather the gray. So this is, that's bigger. So open mouth to the 5%.
So given that tells me that relative to alpha, p-value is small. So what does a small p-value tell you? P-value measures how likely. So a small p-value means the data is not likely. Observation, not likely if H-naught is true.
So the big idea in statistics for hypothesis testing is you have a picture of reality, then you get the data, and if the data doesn't fit in the picture of reality, then you reject the picture of reality. You know your data is good, you know that you did random samples, you have faith in your data. So here, if my observation is not likely, assuming H naught is true, I am going to reject H naught.
And that always, as a researcher, that always makes me happy. So that's my decision based with an alpha level of 5%. percent. carefully state your conclusion in context. So conclusion is just like it's always been.
There is, I believe there is, or I believe there is not evidence for HA. But when you describe HA, make sure that you are actually describing the parameter of interest and the population of interest. So here goes. And by now I'm hoping maybe you're not totally reliant on the template. So.
there is or is not enough evidence to support AJ, whatever that is. So is it there is or there is not, given that I've rejected H-naught? Rejected H-naught means we're accepting HA.
Yay! Throw a party, publish our paper, write to our old professors and tell them they were wrong and you just showed them up and then get tenure. So there is enough evidence to support. Now, HA, you can flip back to what was HA?
It's written, right? it's written right here. There is at least two of the means that are different, but describe what the mean is, and then also describe the population. So support that at least. two of the sample means for hours of study of study and I'm not going to say per class because it's I'm running out of room for the different majors.
At least two of the sample means for hours of study for the different majors, I better put at least two are different, are different for all four with all four means. So that's very awkward, but I know I hit all the bases. I described the populations and I described the parameters. Did I? Is there something fatally wrong with what I just wrote perhaps?
Yes there is. We never make a conclusion about the sample. So I'm going to put a big red pen here and say two of the population means for hours of study for different majors for the different majors that were already mentioned, but I don't have time, are different.
So there now you've got Primo and it's awkward and you could have the order a little bit differently, but I feel pretty good about that. Suppose a friend concludes from the previous ANOVA test that the mean number of study hours was significantly different between all four majors, would this be a correct interpretation of the ANOVA results? Yes or no? And then explain. So you make up your mind is, does this look good to you?
Make up your mind first. There is a right answer here. And the answer is that the friend is wrong.
So Would this be correct? No. The ANOVA test only tests to see if there is evidence.
that two are different. It's a weaker statement. Might be three. I don't think we need this anymore, do we? At least two are different.
It does not tell you, so not more than two and not more. Then two could be, but it doesn't test for it. Also. The ANOVA test can't say for sure, tell you which groups are different.
I would suspect it's STEM compared to business. Business was the most least involved in getting prepared. I don't want to offend any business majors. Of course, it's not about individuals.
It's about the group as a whole. So what we found out was contrary to what I said at the beginning. STEM is number one.
It appears, but Inova is not telling you that. And I had said, I thought business would be number two. So STEM first and then business.
That was not correct. And in fact, the next highest it looked like was the arts and humanities, which might be a testament to people who major in art and humanities actually are doing so out of a love for the subject, not because they want to make a lot of money like the STEM and or the business. So. So there you are. I was wrong about that, but that's okay.
That's why we love statistics shows you that. So don't be fooled. A NOVA can't show you, tell you that they're all different or three of them are different. It's two only. And it can't even identify which one.
So it's inappropriate even now for me to say that STEM studies harder. You'd have to follow up with a different, more specific test. Okay.
Last question. suppose instead of an alpha level of 5%, you have a significance or alpha level of 1%. Would you come to the same conclusion? So we've got our p-value. And when I was learning this, I would just be so rigid about it.
P-value and alpha. So now they've changed. They had an alpha of 0.05.
but now they're going to have an alpha of 0.1 instead. We have the same p-value of 0.0173. And when you're comparing decimals like this, you really, I like to do the apples to apples.
So since this is four places past, I'm going to give this two more. So basically now I can say, well, which is bigger, 100 or 173? It's pretty clear once you put in that to make the decimals the same in every way, and you absolutely do not want to do percent for alpha because the p-value is in decimals.
I can see that my p-value is actually bigger than my threshold for the first. p-value that allows you to reject H naught. So since the p-value is bigger, so it's all relative, relative to alpha, p-value is bigger relative to alpha. So that means p-value measures how likely.
It means data or observation is more likely. assuming H not true. So if we shrink this, we can see here, oh, it was a while ago, wasn't it? Oh no, it wasn't. So here, same result was smaller than this threshold, but here that same result is bigger.
So, oh. Now it's deemed, relatively speaking, likely. So it fits in our picture of reality, which is H-naught. So that means we are going to, your book says, fail to reject H-naught. Such a harsh thing to say.
I'm going to say, or keep H-naught. So we're stuck with H-naught. So as a researcher, we're sad because we can't publish.
We can't tell the old scientists that we're right and they were wrong until another younger scientist comes along. I haven't fully answered the question though. It's saying, would you come to the same conclusion? No, I would come to a different conclusion there.
It's almost the same though. where is that conclusion i'm looking for the conclusion okay so i'm going to cheat a little bit i don't want to have to write it all out again okay copy paste okay There we go. And I'll just put in a not. I hope it lets me write on this. There is not enough evidence to support that at least.
I'm going to do something else here to clean it up a bit. There we go. Raise a list.
They'll never know that I didn't write it out again. There we go. There is not enough evidence to support that at least two of the population means for the hours of study for the different majors are different. I'm going to go. In fact, oops.
In fact, it looks as though. In fact, I am going to continue to believe all the groups'study times are more or less the same. That wasn't actually my belief because I was a STEM major and my sister's, my sister was an art major and she had a really fun time in college going to parties, doing her art. And, uh, but when she graduated, she was a checker for a number of years and I went right into a high paying job and I, I love science and I love, I love statistics.
So, um, so I was happy with the decision I made. She did land on her feet though. She's doing great now.
Um, and she is enjoying her art and she loves it. So, um, so let's go back and make sure we, we did everything that we wanted to here. Um, we looked at one way ANOVA, uh, and we emphasize that it's comparing means more than two usually. Um, We use technology to do a one-way test. So I want you to know where to find your test statistic, what level your, what letter your test statistic is, which is an F.
And we interpreted the results in context. So I think we are done. So, and we showed that Bronwyn was wrong here. Business majors are not number two. Don't know what is number two, just.
there's a difference. Okay. So go ahead and take a little break.
And then if you can, within the next hour, start your practice. That would be ideal though. Do the practice before the deadline.
All right. Talk to you later. Bye.