Hi, welcome to chapter 13. Today we are going to be looking at a case study on plastics, actually plastic pollution, and it will be a vehicle for introducing you to hypothesis testing for sample means to see if population means equal unknown constant, that would be a one sample t test, or to see if two populations have the same sample mean. So, let's get started. Let's see, I'm going to share with you Desktop 1. Here we are. Okay. So, here we go.
So, here is our worksheet. And at first, I'd like you to think for problem number one. Can you think about... which corporations do you think contribute the most to polluting our environment with plastics?
So what do you think? Can you come up with what that is? All right. So pause and come back to me.
All right. Welcome back. So again, we're going to be looking at hypothesis testing of sample needs, and we're going to be looking specifically at at the t distribution which you've already seen in the context of confidence intervals so today it'll be in the context of hypothesis testing or significance testing and we'll be defining the null and alternate hypothesis which will be very similar to what you saw for proportions and we will be identifying assumptions which again will be very similar to hypothesis testing of proportions, similar but not quite the same. So you've already, we're going to be looking at a data set on plastics.
And it's going to be from this great effort, it was a volunteer effort, international volunteer effort to track plastics, and to track who created those plastics, which corporations created those plastics. So we're going to want you to look at the data set. But before we do that, I'd like you to look at this video.
So here goes just a quick video. Each year, Break Free from Plastic Changemakers gather to reveal the top plastic polluters trashing our communities. With brand audits, people collect plastic waste and document the brands on each item.
The corporations polluting the most. places with the most plastic are named the world's worst plastic polluters. In 2020, almost 15,000 volunteers in 55 countries organized brand audits.
Waste pickers also participated to highlight how low-value plastic packaging makes it hard for them to earn a living. This year, the world's top plastic polluters are the Coca-Cola Company, PepsiCo, Nestle, Unilever. And Mondelez International. We will not keep cleaning up after corporate polluters. It's time they take full responsibility for the plastic pollution crisis and adopt real solutions.
They need to reveal the amount of plastic produced, drastically reduce it, and reinvent packaging for refill and reuse. Okay. um sorry okay so um so if you had trouble seeing that video it is in our um the links to that video and also any data sets you can find them in week 14 in your module so i haven't published this yet but it's right here so the um the it's called and if you see here it's called break free from plastic clip that's what i named it but the break free from plastic is the organization that created this data set or actually organized to have tons of volunteers people like yourself create the data set so um and i i i think you know if you look at your worksheet it breaks down um the different if you look below whoops If you look below the worksheet, it breaks down the...
the different categories subcategories so this is the big data set um and this bigger all right here's the big data set it lists the country so it's in alphabetical order um and so you've got your country i don't know why it's not scrolling there we go scrolling so the first country is argentina and then you can see this data is both 2020 and 2019 data. And then here are the parent companies. And so they literally counted up which companies do. And then these are the different types of plastic. And I have looked at the answer key on this and it asks you to, in problem number three, it asks you to count the different.
plastic types and I'm not an expert on this but go ahead and pause and first let's look at question two I can get that yep so let's look at question two and for question two So here's a list of all of the different types of plastic. It's a little intimidating, but if you read them like PS is polystyrene. So you know what that is.
And PP is polypropylene. You might not know what that is, but flower pots, bumpers for cars. And then PET is polyester.
Sounds so sweet. So these are the... So take a... take a look at on your own because you have access to that link in the course modules and then if you look if you pick up any plastic you'll see for number two um plastic items uh contain this code and there'll be a number on the inside um what do you think the code means so take maybe pause and go look at a piece of plastic and have a look um So, welcome back if you paused. The codes are different types of plastics, and the different numbers indicate how easily I think they are recyclable.
But in this day and age, it's a little misleading, because so many plastics are not actually recyclable now, I believe, because China is not. I mean, what we thought of as recycling. People were just shipping them to other countries and putting them in dumps there and they weren't actually being recycled. So for the most part, plastic, when we create a plastic bottle, maybe once it gets recycled.
But after that, if you buy a shirt that's made out of recyclable plastic, that shirt then can't be recycled. So it's only a one-time thing for a lot of them. So we've got a lot of technology that we need to improve there.
So- So what do you think this means? What I'm going to say is these codes identify different types of plastic. And it also identifies whether or not those plastics can actually be recycled and how many times they can be recycled.
And once you learn about that, it's kind of grim because it's not as happy a picture as you would think. Okay, so let's move on to number three. So I'd like you to look at that data set and you can access it.
by clicking here and you'll see the data set or you can go to your modules and click there if that doesn't work for you. So identify, list the different types of plastic. So I'm a little confused by this question because when I look at the data set. Get it now. So here I'm looking at the data set.
And, um, these are different types of plastics. So, and you can look at the code, um, in the, in the sheet pet stands for polyester. PP stands for, um, polypropylene PS stands for polystyrene and PVC stands for, well, I guess.
They just said it was PVC. So if you look at the little chart just before problem number two, it breaks it down. So in this answer, when I look at the answer key, it says that there are only four different types of plastic, PET, PP.
PS and PVC. But I see if we look over here, I don't know which which desktop I'm sharing and being too complicated here. If we look over here. I also see HDPE and IDPE. And when I look at the codes, high density polyurethane and low density polyurethane.
So I'm going to add that in. And there are other weird headings here, like this one here that says empty. And when you look over, you see category left empty count.
And I'm thinking, when I used to use Excel to do your grades, sometimes not your grades, but previous students'grades, you couldn't actually do some of the calculations unless you had a whole column with zeros in it. So whenever somebody breaks down their data set to you, sometimes there are headings that just are confusing. You don't know why. And they're relics of other things.
So don't be intimidated by that. Just do the best you can. So I'm going to do the best I can.
And I'm just going to cover my bases here by typing in. So I'm now back over here. I'm going to type in. Four types of plastic, but I also see two others, HDPE and IDPE, high and low density plastic. comes.
That's my answer. So this is me kind of covering my basis because I don't have the background to fully understand the data, completely understand the data set. But that doesn't mean I can't go through. So since the answer key says there's only four types, okay, I'm going to go with it. But I see two others and I'm thinking maybe those are broad overarching types, perhaps.
I don't know. But this, whenever you're in a situation like this with your teacher, where you're like, I don't know if she wants this or this, just give me both things. And then it shows that you've done your due diligence.
The point of this question is to have you really look at the headings and make sure you understand what headings are which. The main thing is to look at the parent company. And that's what really reveals. What's going on here? So the first data set that we're looking at, and I'm going to come back up here.
The first data set we're looking at, right? I'm waving my cursor here. This data set is, it was comprised by thousands of volunteers who went through and picked up plastic and categorized it in many countries.
I can't remember how many countries they said. I think it was in the 50s. All right. Well. It's not there, but you can look it up.
So that's problem number two. And what we're going to do now is we're going to use... So for problem number four, we want you to look at this dashboard. So somebody, and her name is Sarah, took this... large data set of the the brand audit of all the different types of plastics that were found within a short period of time across the globe and she made um and do you notice here where it says shiny apps she used the same program that is driving our dana center um website so she so click on there cut and paste whatever you need to do um you to get that bumper and I will put that link also in your in your module in case you cannot click and paste click so I'm going to stop sharing here and I'm going to I clicked on it it appeared in the other desktop that I have I realize you guys may only have one little screen so here is the bumper And what it's doing is it's taking that large data set and it's called a dashboard and it's manipulating things very quickly, very much like we do in the Dana Center, the DCMP shiny tools that we've been using all semester long.
So what I'm going to do is in problem number four, it said, select a country that interests you. in the drop down menu. Well, you can select whatever country you want.
I'm going to select the United States because that's really the only country that we have control over in terms of legislature. So as you can see, there are two different years for this particular data set. If you want, if you're more interested in the Break Free from Plastic movement, There is information at the end of that video that I showed you. So here it is. And it's got the total recorded plastics items on a particular day when all the volunteers went out to record them.
And in 2019, there were a whole lot less plastics than in 2020. Now, why that is could be lots of reasons. Could be the pandemic, maybe. People had more time on their hands, but it also could be that we are, um, we're just drowning in more and more plastic every year.
Um, but what's interesting is up here, uh, who were the great offenders? So the Kroger company, I don't know what that company is, but I bet if you Googled it, you would find that it's lovely, uh, products that you love. Um, And oftentimes companies that have a very friendly front-facing view, PR view, have some sort of a more innocuous name that you don't recognize for when they're being unpopular. So the Kruger company, so go ahead and Google that if you want.
They were the top offenders. PepsiCo. And then here is the Coca-Cola company.
And the Coca-Cola company was the third offender in the United States, but it's a top offender in globally. And it was identified, as you saw in that video, as one of the major offenders. So if you ever wanted to do a term paper on something else having to do with plastic pollution, this would be a great resource to start from.
It's just Google break free from plastic. you've got this dashboard you've got all kinds of things and you've got the main data set linked here so for because i picked the united states um so keep keep that i think i'll just um drag this over here so your answer might be different but what kind what are the total number of plastics recorded in 2019-2020 from that country. So the totals for 2019 are right here, 4,314. 2019, 4,319 items.
So of course it's not all the items, this is a sample. This is not a population. And then in 2020, 9,000, almost 10,000, 9,957 items.
And they all different types of plastics and so on. And what are the total number of plastics reported? Okay, so there that just shows me that you've accessed the dash.
It's called a dashboard and it's a way of manipulating a different data set. Some computer programmer did all that work for us. And what are the companies reported to be the top polluters in 2019 or 2020?
So, I'm just going to do the answer. It says or, so that means I can. So, you may have picked a company that doesn't have one, you may have picked a country that doesn't have one particular top polluter, and you can say that, but I'm going to just do 2020. The Kroger Company.
And I really would love to look that up. PepsiCo. I wonder if that's Pepsi. It probably is. And Coca-Cola.
So you do you. Pick a country that's interesting to you. Maybe it's the US. Maybe it's a different one.
Oh, Trollsy. Don't want that. So just want that.
Okay. So those are my answers for that. And so what we're going to, what we have here for problem number five through eight, I have taken you through steps for if you would like to manipulate this data from this large data set.
So it has countries, it has, it's a very large data set. And you can see all the numbers of volunteers and the number of days that they organize. So seems like Argentina is really on the ball. And they had 24 events.
So you could do all of this. Um, it starts here if there's time. But If we're short on time and it takes you through the steps on how to do that, and you could do that, especially if you wanted to do a different term paper, more in-depth term paper where you search the data sets on different questions.
But what I've done is I have, if you're short on time, I already went through the steps identified here. to create the data set that focuses on a major polluter that comes out of the United States, and that's the Coca-Cola company. So I have it, if you notice here, this is the, you may be able to cut and paste directly from the document.
If not, you can cut and paste here. If not, you can go to your course module and the link is there and it's the second data set. which has the word Coca-Cola in it. So I am going to call that one up, see if it will work right now. Bam.
So there it is. And so what, this is, this is only 50 items. If you count them, it has the original rose here, but it's only 50 items. that were all the Coca-Cola and it condenses it by country.
So it's manipulated the data. It's kind of condensed it nicely. So we're going to be using this data set and we're going to be using this data set to cut and paste into our beautiful Dana Center that we love. So, so we're right here. We are interested in answering the following research question.
So you're going to need to really focus on this research question throughout this activity. So what happened is that Coca-Cola came out with a public statement saying that they had, their claim is that during this, they only have this many items. that were found in 2000. The average for every country, the average number of items found during the audit was about 2000 or maybe some other source, maybe not the break free from plastic audit, but they came out with a PR statement saying that in every country, the number of items, so let's read it.
For the products... by the Coca-Cola company is the average total plastic count found in various companies in 2020 different from the claim that 275 items were found for the Coca-Cola company so the Coca-Cola company is saying it's only two thousand two hundred and seventy five The break free from plastic people think that it's different than 2005, 2075. That's what they're wondering. So for problem number, so you've got the data set here. Based on the research question, so for problem A, based on the research question, are we interested in testing a proportion or a mean?
So. So just that's always your first question when you're entering hypothesis testing is, is this proportion or is this mean? So read the problem again and then decide what you think. Okay.
So because it says average total plastics count found in various countries, then averages mean they mean the same thing. So the answer here is. mean. And so the keywords are, well, the keyword is average.
Another tip off is also all of the raw data is. numerical or quantitative if you look at the if you look at the um we're looking at right here the grand totals of the different types and um so for argentina there were 3 268 quite a far cry from 275 which the cocoa now but when you go down to the next country Oh, that was in Argentina. Argentina is 44. But then I don't know what country this is here. Korea. Is that right?
Kenya. So big difference. I don't know why, but it's interesting.
The key words for deciding whether or not you're dealing with a proportion or a mean. Well, it's the data is quantitative. So we're counting the number of items as a put.
So how many did Kenya have? How many did, did Germany have? How many did you go by? And the answers are numbers, the number of items that were found during that audit.
If it was more like, do you have, what's the, how many people, do you have blue eyes or do you have green eyes? is what proportion of the of the items come from coca-cola then it's then we're when proportion percent or that the raw data is qualitative or categorical that's a tip-off that you're doing a proportion test but you're not you're doing we're doing an average test okay so What is, so now we're going to, for part B here, what distribution do you think we should use for hypothesis testing that would answer this question? the two distributions you really have to choose from are the normal distribution or the t distribution so um at this point i want to switch to um to actually drawing a picture of this so we will stop sharing screen and then we'll share screen here Yay. All right.
So, so for problem number, Oh, that kind of gave it away. Um, so we are dealing with means and, um, what distribution do you think would be used to in the hypothesis test for this answer. So your choices are the normal distribution or the t distribution.
And remember there's not just one t distribution. there is actually many t distributions here many t distributions a different t distribution for every sample size so so it's actually the normal distribution is if you know if you are given and So you have some number here for your average, and you would have some standard deviation, and it would be known. That would be the normal distribution. The t distribution, that's going to look a little bit different. How's it going?
Good, I'm recording. I'll be just give me a moment and I'll stop. So the t distribution and specifically it's t, this is the degrees of freedom, and if you remember degrees of freedom are n minus 1. If you look at the data set and you count them, the rows are 50. So this would be t 49. And so that would be a different number. And so if we standardize things to do our test, we would then, when you standardize, the centers are always zero when you standardize. And the spread for this one, the spread for your test statistic would be one.
So you would go out one in both directions. And we would call this normal 0,1. So it always lists the center first and the spread second.
But here, whatever the standard deviation is, it's bigger than 1. And we use this distribution if we use this distribution, the t distribution. If sigma, the original spread of your original parent distribution is not known. Now, the reality is that the data set we're looking at is a sample. So when we look and we get a spread distribution, it's going to be S. The data set we're looking at is not the population.
Population would be if we knew every single piece of plastic that was thrown away in 2020. and we're able to count it all up and figure out the numbers for every um we don't know that and it's unknown, is not the population, we don't know the true sigma x. We don't know that. So we'll use T49 instead.
of n zero one so we're not going to use this one because we didn't know don't we don't know we don't know sigma x so we don't we don't have enough information so this is the one and by not knowing sigma, we have more variability. And we have to acknowledge that we're not just estimating what the true mean is, we're also estimating what the true standard deviation is. And for all sample means, for all hypothesis testing, note, for all mean hypothesis testing, we will use t distribution instead of n01.
So all you need to do is read the problem on the exam. and identify, am I dealing with means or am I dealing with proportions? If you're dealing with proportions, assume here, assume sigma equals the square root of p, 1 minus p over m. So we have a nice formula for that. We don't really have a nice formula for what the spread of...
sigma x is. And so we can't calculate the spread of sigma x bar. Okay. So if I say mean, you know, you're dealing with t. It's a given.
Okay. So let's refresh our memory on t distribution. When you're dealing with the t distribution, you would have your test statistic. is going to be a t-test statistic and it's observation so some sort of average minus the center of the hypothesized and I'm going to put a little knot here because it's what the powers that be say is the true average and then you're going to divide it by well you're not looking at individual countries, how many plastics products they have, you're taking an average of many, many, maybe you're looking at 10 or 20 or 30 or 40 countries at a time.
So your spread is not sigma, but sigma x, it's sigma x bar. And we learned in a previous class that that is calculated by the sample standard deviation. Chopped down, chopped down, divided by the square root of the sample size.
So the bigger your sample, the smaller your standard deviation. So there we go. That's, and the conditions are, well, one of the, this is, you only are sure that it has, it settles down to this shape, to this shape.
as long as either n is big enough, sample size is large enough, and this is the threshold, or n can be tiny and small, but only if the original data comes from a normal distribution. So if you know that the original data, the individual... counts, follow a nice normal distribution, then you can be sure, then you can take a really small, you don't have to do, you don't have to work as hard.
So, so what are the conditions? Well, same old, same old conditions. For us to assume that we can do you, that we can do it, it's called a one sample T test.
So your test statistic, right here, comes from the t distribution. You can only be sure of that if your original data, your original samples are random, and that you follow all of the good practices that we've studied in the first part of the semester on getting good samples. And so either your samples are random, or For some reason, you can feel confident that your samples are representative of the population that you're trying to understand.
So the samples are random or representative of the population. So what's the best way you know that you've achieved that? Because you're looking at data from a reputable source. And reputable sources, not Google. Google Scholar is a little better.
But reputable sources are like the sources that I've used throughout this semester. Notice I don't use Fox News and I don't even use CNN, which, of course, you know which one I read. But I want to make sure that the data that's put out there is fair and balanced and that it was, you know, there isn't an agenda. Well, clearly, the people who did this organization, the break free from plastic. Clearly they have an agenda.
So that's a little bit of a misspeak, but they are a reputable. I know that they're not going to lie about their data because they don't want to. The data is so overwhelming.
So that's the first condition. The second condition is that the sample size is big enough. And so we have it in the little box here that we want to have the threshold now for proportions.
It's NP is greater than. 10 and N1 minus T is greater than 10. So probably the value in your head is 10, 10, 10, 10 successes and 10 failures. That is not appropriate for the one sample T test.
For the one sample T test, the threshold is that you want to have usually a sample size of 30 or more is considered to be large. That's So 30 or more is the rule of thumb. And you could see that your distribution settles down at 30 or more. But sometimes, oftentimes, scientists will do less than 30 if, or less than 30 if your original distribution, so I'll just highlight this in a different color.
less than 30 if your distribution is approximately symmetric. I would say approximately normal. And how do you know, how would you know that?
What you should do to begin with is you should look at the dot plot, the box plot, or the histogram to see what the shape is. So that's exactly what we're going to do right now. So What I am going to do, let's create a histogram.
So I'm going to switch back. So let's stop sharing. And I'm going to share my other desktop. Because for whatever reason, when I tried to cut and paste a column of data from And this plastics, this, this data set that I created, it's, it's, it's somehow speaking to the parent data set. So when I do it on the iPad, I end up getting thousands of terms, even though I can see that I'm cutting and pasting 50. I've been able to figure out that problem.
So we're going to do, we're going to switch back to this desktop and I want to. The directions say... So I'm looking at number six and number six says create a histogram of the grand totals. So the column I want for grand total is right here.
So I'm going to just highlight that column created. So I'm reading number six, create a histogram of the grand totals of all the plastics found in various countries in 2020. Um, and the, the, you can go to the link in question five or you can, yeah. So there's question five.
So we're going to be looking at this link right here and I have it. Oh, you can't see that. So, um, it's the, um, either link actually works. Um, Coca-Cola data. And if I can do this.
So, question five, all the order has changed on this because I've been typing in it. So, you can maybe click right on here. If that doesn't work you can cut and paste this.
If that doesn't work, go to your modules and there's a live link for sure in your modules. And it's the Coca-Cola data that we're interested in. So you're going to cut and paste.
So what it says is number six, copy and pasting. the correct column into the correct DCM tool. So notice I'm not actually giving you the link of the DCM tool because you need to know that you want to create a histogram of this data. At this point in the semester you need to know how to do that because that's going to be on the final.
Create a histogram, here's some data. So I would like you to create a histogram. So I'm so Pause and do that.
Take this cut and paste this column into the appropriate Dana Center tool. So we're going to here's all of the you should have this bookmarked. And this is actually the very first thing you learn in the class.
So go ahead and do that. Pause and come back. OK, so I am going to be it's I remember that I created histograms in this part and. This is categorical data. This is going to be if I want to do bar charts or pie graphs.
That's going to be categorical slash qualitative data, not numbers. This next one is the quantitative. And these, we didn't really use this one. We didn't use this one. You don't need anymore.
And this one right here was to get you used to how. the binomial distribution worked. So really for the final, you need to be comfortable with these two.
So I'm going to click this one and it's quantitative data. So we're going to do that. And this is not from our textbook.
It's on our own, but it's, and so it's individual observation. It's not afraid. These are all individual observations. So I'm literally going to copy and paste. So copy.
and paste here. Control V for me, because I'm an Apple person. And it seems to know that this is not correct.
So, but I like to delete it anyway. It just bothers me that there's numbers in there. So here's my histogram. So if you are using an iPad and you cut and paste and you got something that was more, that was thousands instead of 50, I would suggest you go to it. a computer like I'm doing and do it on that instead of your iPad.
So it's giving me the histogram and the box plot. Why not do the dot plot just for the heck of it too? The dot plot, so my dots are too big. So I'll make them just a little bit smaller. So each one is, it says that 250. I'm going to bring it down.
Ooh, that's too small now. This is a pretty small data set. So I think That's pretty helpful. And I can see how the dot plot and the histogram are pretty similar.
And if I hover, will it tell me what that is? No, but it will here. Oh, no, it doesn't.
There's just one observation that is between 4,000 and 4,500. There's one observation that is between 3,000, 3,500. Most of the observations are within here.
And remember, for data sets, for histograms, you can't tell where that is. So, you know, it's possible that Coca-Cola was telling the truth when they said that the true average was 275. But if we go up here to our descriptive statistics, we have the actual sample mean and sample standard deviation of individual observations. So we've got that there. So the question is, and I'm going to just pull this out of here so we can see what we're doing.
Describe, describe the shape and spread of the histribution, histribution, histogram. So I would say, so go ahead and answer that question. Note any random samples of countries.
Note that a random sample of countries was taken each year, and the actual countries included vary from year to year. So this is a volunteer organization, so it's not, the data collection isn't perfect, but we can certainly see something going on there. And I think this is for 2000, and I look here, I think we gathered for 2020 only. Yeah, so this is only 2020. So let's get this up. So what does the distribution look like?
Well, shape skewed heavily to the right. Okay. skewed heavily to the right, spread. So we want to talk about the spread too.
So for spread, you could really, the standard deviation is kind of a scary thing to use when your data is skewed so heavily. So I'm going to give both of them because I know I'll need the standard deviation for my hypothesis testing. But by then I will be looking at I'll be not looking at this right now. This is the individual raw data.
So we're going to have to look at samples that are quite big. This is the original data. So spread, well, I could say the standard deviation S. S sample standard deviation of individual countries is 726 items. So it's saying.
better to use range since the data is so skewed. So I'm going to say it's between zero. Well, if I look at the, if I look up here, I can see the minimum was one country only found one Coca-Cola.
product. I'd love to know who that was. Maybe it's more than one country. Actually, it probably is more than one country.
It looks like it's all those countries. So between one and these two seem like outliers. So I'm going to say most observations are between zero and a thousand. Most values are between zero and 1,000 items found for Coca-Cola with two outliers, two extreme outliers. at.
So I could use the histogram, but it doesn't, it'll give me the range or I could use the dot plot. It's a little more precise. So I would say around 3,400, it's a guess. So I'll say not at, I'll say around to indicate that I'm around 3,400.
And let's say 4100 approximately just to let the reader know that i don't i'm not looking at the data i'm looking at the histogram i could i don't think it'll tell me the actual value no doesn't tell me the actual value in here doesn't seem like that feature's turned on so we've got two outliers We can see for sure they're outliers. I wonder if that would tell me. Oh, look at that.
4,068. And the other one, it's not telling me because. So that was pretty good. 4,000.
Why not be as good as we can be? 68. Still approximately with the other one. Now it looks like it's more like 3,300, but we have other outliers.
So I'm just listing the two extreme outliers, but there are more outliers here. One, two, three, four, five. Five outliers out of 50. It's pretty significant.
So some of these would work out to be too far away from the pack. Okay. But I think I have done more than enough to answer this question. Note that the random samples describe, okay, so I think I answered that probably to death. So you might want to draw a sketch of that just to remind yourself, because the point of these notes is to help you remember what the lesson was.
Okay, so the next question. asks you to verify the conditions, verify that the conditions are met for the one sample tea. It's called one sample tea because we're comparing one sample. We've got 50 observations and we're going to look at the sample mean of that and compare it to what Coca-Cola says is the truth. So Coca-Cola gets to say, they get to set the stage, they get to set what is perceived to be the correct answer, which is 275. And we are testing, is that right?
Was it 275? I don't want to misspeak. I'm going back to the research question. Problem number five. Yeah.
They say their average is 275. We're now looking at the data to see if they're actually being accurate about that. So verify the conditions. So So and it's broken down here.
Part A, are the samples random? And part B, is it normal or is the sample size, is the distribution approximately normal or is the sample size for the sample from the population large? So pause and answer.
So hunt back and look and see. and pause and come back. So if you read it, we were actually told in the setup, we were told that the samples were random by the organized.
station that organized this brand audit. break free from plastic. So you can choose whether or not you trust them.
I trust them. I trust this organization's word that the samples were random or represent the larger the population of plastics. Okay.
So, and specifically, I trust, can you hold on just one second? Okay. Sorry about that interruption.
Let me go back. to here okay so i looking at this website and um so i'm going to turn this blue so you can see my answers more clearly so we were told the samples were randomly were random by the organization like break free from plastic i personally trust this organization it's been published in several uh several articles that are reputable, peer reviewed. But maybe, maybe you don't, but I do.
And I trust them more than what Coca-Cola would say. So it's all relative. So, so that's how I know about that the samples were connected, were collected in an ethical way. And then the next question is, the next thing you're going to want to check. is, is the sample size large enough?
Well, so this is your distribution of your, this is your best guess of what the population would look like, is what this looks like. So the original population is heavily skewed and has some outliers. So we're going to hope that a sample size of 50, because remember, this is a sample size of 50. Our sample size is n equals 50. We are going to hope that is large enough.
It meets the threshold, the recommended rule of thumb. of 30 for a two, for a T test, a one sample T test, because we're only looking at one sample and comparing it to a known constant. What Coca-Cola says is the truth.
But I'm going to say that I actually, given that there are these outliers, these two outliers here, and there's actually a bunch more here. And we can see there's at least one, two, these could be overlapping too. One, two, three, four, five. I'm just going to say, honestly, honestly, I wish the sample were a bit bigger because the fact that we know that there's 10% of our data, at least, one, two, three, four, five, five out of 50 of our data points are outliers.
It concerns me. I would want a bigger sample size to make it more stable, but technically it's, it meets a threshold. Okay. Right out. the null and alternate hypotheses that would be used to answer the research question in number five.
Remember to use correct notation. So this is kind of tricky because I'm encountering problems in Canvas and getting you to be able to type in the correct notation. It's just not available in Canvas. So instead, it's been multiple choice. On the final exam, I might have paper and pencil for this to just kind of bypass that.
So be ready to answer questions with a blank like this, as opposed to what you see in your Canvas homework. So I'm going to switch to handwriting. So we will leave here and we'll go to my beautiful iPad.
There we go. All right. So we are now on problem number eight.
So write the null and alternate hypothesis that would be used to answer the research question stated in question five. Remember to use the correct notation. So we know H naught is always going to be letter. Here's the letter.
And it's going to be Greek letter. Symbol number. So you may have that and H-A is going to be very similar.
Letter, symbol, number. So I'm going to say same letter. and same number.
And for this it's going to be either less greater than, less than, or not equal to. It's going to be one of those. We know that this symbol right here is always equal, and we have to get this the symbol down below from the context of the problem. So that's got to be context.
This has to be context. This has to be context. So I'm going to go back to number five, because it's always good to be careful if you don't get your question right. So here's our research question.
For products reported by Coca-Cola, is the average, oh, bing, bing, bing, you already knew that, we already identified it, but it's good to just be thorough about it. So our symbol. is going to be mu and mu. And I know I'm using u in Canvas because I cannot get Canvas to write this Greek letter, which is actually m. So my bad, but I was desperate.
And then, so we've got average for the various countries. Oh, look at that. This is the research. They're wondering if it's different. Now, I know that a part logical thing would be to think that Coca-Cola is lying and it's actually way more plastic.
But the fact that it said different from means this is always the research question, the research, this is what the researcher, what the researcher, the research question really question is wondering. So they said different. And so the other always is equality. And the number that Coca-Cola threw out there, and I'm suspicious of it, they're saying it's a parameter, is 275. And we're wondering if it's not 275. So I think it's a good idea to define where.
So also clearly define the parameter. listed in your hypothesis. So I'll pick where mu equals the average or mean number of plastic items per country coming.
or originating from the Coca-Cola company. So if we wanted to know what percent was Coca-Cola, then we would be doing a one sample proportion test of proportions. And that would be a Z test because we would have a better way of calculating the standard deviation of the... distribution of p hats, but we're not there. All right.
Number, so that was number eight. Number nine, what if we are interested in answering the following question using the sample, so using the same data? So here is a different research question. So I will pick a very different color. Let's pick lime green.
Oops, don't want to obliterate it. So different question. Products reported by Coca-Cola Company, an average total plastics is less than the average calculated in 2019. Is this a one sample or two sample test? So we're comparing. what happened in 2019 versus what happened in 2020. And we're wondering, H naught, well, H naught's always about equality.
So I'm going to go right to, this is a research question. So I'm going to go to HA and let's write that one first. Is the average total of plastics found in various countries in 2020?
So I'll say new 2020. Is it less than the average found in 2019? So I can pick any subscript I want. So that is going to be, that would be the research question. And clearly they didn't, oh, we do want you to write it.
Oh, my bad. I should. this should go down here.
So H naught, we haven't gotten to that one yet, H A. We're wondering if mu 2020 is less than mu 2019. And so, so the the research question is always the alternate hypothesis. So we're going to compare that to what's always, always, always, what do we always say for HA? For H naught?
Equals. So everything else has to be the same. Mu 2020 versus mu.
2019. Now I think actually it was the other way around. If we looked at the data, there's no way this is going to pass. But as you can see, you're comparing, if we were going to look at this, we would have a data column, column of members from 2020. and we'd have a column of numbers from 2019. So we'd have numbers, numbers, we'd have a whole column of data.
And what we do is we would take all, we grab all these and we calculate new 2020. Nope. we would calculate X bar 2020. And we do the same thing here. So this is data, data, data.
So we would add up all these numbers and divide by 50 and we would get X bar 2019. And we would be comparing. So we are definitely, this is sample one. And this is sample two. So we are definitely not in a one sample.
We're in a two sample test. And it would be a two sample t-test. And we'll talk about that in an upcoming section.
So it's that right the null and alternate hypothesis that would be answered. Define the parameters of interest using the correct notation. Oh, they want mu one and mu two.
All right. So. we got to follow the rules. So mu1 represents 2019. Okay, so we'll change that to mu1. And they want 2019 and they want mu2 to be 2000 would be 2000 population mean for 2020. So got to follow the rules, I guess we can't do so we're wondering if let me make sure define null hypothesis to answer this question, make sure average is found there is less than the average.
is that Boston? Yep. That's what they want us to say, even though it's kind of looking at the data, that's going to be ridiculous. So in order to do that, I have to make sure these have to be identical or you've really screwed things up.
So mu two versus mu one. So there we go. define the parameters.
Okay. So where, where mu1 equals all the average of all the average. Oh, I didn't do this. Number of plastic items, items. So I've got to make sure that I've defined the parameter.
I just did average number. But of what? Of plastic items from where? So I've got to do my population of interest for...
for all Coca-Cola plastics items collected from all countries. in and we're doing mu1 so mu1 is 2019 yeah in 2019. So where mu1 is the average number of plastic items taken for calculated from from all Coca-Cola plastic items collected from all countries in 2019. So per country, I'm just going to, that's overkill, per country. Okay.
It's not beautiful, but I sure have explained to my teacher that I know what I'm doing, that I'm looking not at all plastics, but all plastics originating from Coca-Cola. And I'm looking at per country and taking an average of all of them. So do the same.
So if I had time on a test, I might make this a little better. But it's good enough. It's good enough.
And that's what statistics is all about, being good enough, not perfect. And so mu2 equals the true, I'll make this a little better, true population average or mean of all Coca-Cola plastics. plastic items as trash um per country per country for Every country in the entire world. We're not going to be able to get that information.
We're never going to get that. So we're going to look at samples. And we were looking at samples.
We were looking at a sample of 50. And it gives us a good picture of what's going on in the world. And I think we're going to find that Coca-Cola wasn't being honest. So anyway, okay, we're done.
I will see you in the next video.