Transcript for:
Hypothesis Testing in Plastic Pollution Study

hi welcome to chapter 13. today we are going to be looking at a case study on Plastics um actually plastic pollution and it will be a vehicle for introducing you to hypothesis testing for sample means to see if population means equal unknown constant that would be a one sample t-test or to see if two populations have the same sample mean so um let's get started you see I'm gonna share with you desktop one there we are okay so um here we go so here is our uh worksheet and um at first I'd like you to think for problem number one can you think about um which corporations do you think contribute the most to uh police polluting our environment with plastics so what do you think can you come up with what that is all right um so pause and come back to me all right welcome back um so again we're going to be looking at um hypothesis testing of sample means and we're going to be looking specifically at the T distribution which you've already seen in the context of confidence intervals so today it'll be in the context of hypothesis testing or significance testing and we'll be defining the null and Alternate hypothesis which will be very similar to what you saw for proportions and we will be identifying assumptions which again will be very similar to hypothesis testing of proportions similar but not quite the same um so you've already we're going to be looking at a data set on Plastics and it's going to be from this great effort it was a volunteer effort um International Volunteer effort to track Plastics and to track who created those Plastics which corporations created those Plastics so we're going to want you to look at the data set but before we do that I'd like you to look at this video so here goes just a quick video each year break free from plastic change makers gather to reveal the top plastic polluters trashing our communities with brand audits people collect plastic waste and document the brands on each item the corporations polluting in most places with the most plastic are named the world's worst plastic polluters in 2020 almost 15 000 Volunteers in 55 countries organized brand audits waste Pickers also participated to highlight how low value plastic packaging makes it hard for them to earn a living this year the world's top plastic polluters are the Coca-Cola Company PepsiCo Nestle Unilever and Mondelez International we will not keep cleaning up after corporate polluters it's time they take full responsibility for the plastic pollution crisis and adopt real solutions they need to reveal the amount of plastic produced drastically reduce it and reinvent packaging from refill and reuse okay um sorry okay so um so if you had trouble seeing that video it is in our um the links to that video and also any data sets you can find them in week 14 in your module so I haven't published this yet but it's right here so the um the it's called and if you see here it's called Break Free from elastic clip that's what I named it but the break free from plastic is the organization that created this data set or actually organized to have tons of volunteers people like yourself create the data set so um and I I I think you know if you look at your worksheet it breaks down um the different if you look below whoops if you look below the worksheet it breaks down the um the different categories subcategories so this is the big data set um and keep this sticker all right here's the big data set it lists the country so it's in alphabetical order um and so you've got your country I don't know why it's not scrolling there we go scrolling so the first country is Argentina and then you can see this data is both 2020 and 2019 data and then here are the parent companies and so they they literally counted up um which companies do and then they they're these are the different types of plastic and I have looked at the answer key on this and it asks you to in problem number um three it asks you to count the different uh plastic types and I'm not an expert on this but go ahead and pause and um first let's look at question two I can get that yeah so let's look at question two and for question two so here's a list of all of the different types of plastic it's a little intimidating but if you read them like PS is polystyrene so you know what that is and PP Is polypropylene we might not know what that is but flower pots bumpers for cars and then pet is a polyester sounds so sweet so these are the so take a take a look at on your own because you have access to that link in the course modules and then if you look if you pick up any plastic you'll see for number two um plastic items uh contain this code and there'll be a number on the inside um what do you think the code means so take maybe pause and go look at a piece of plastic and have a look um so welcome back if you paused uh the codes are different types of plastics um and the different numbers indicate how easily I think they are recyclable but in this day and age it's a little misleading because um so many Plastics are not actually recyclable now I believe because China is not I mean what we thought of as recycling people were just shipping them to other countries and putting them in dumps there and they weren't actually being recycled so for the most part plastic when we create a plastic bottle maybe maybe once it gets recycled but after that if you're if you buy a shirt that's made out of recyclable plastic that shirt then can't be recycled so it's only a one-time thing for a lot of them so we've got a lot of technology that we need to improve there so so what do you think this means um what I'm going to say is um these codes identify these codes identify different types of plastic and it also identifies um whether or not those Plastics can actually be um be recycled and how many times they can be recycled and once you learn about that it's kind of grim because it's not as happy a picture as you would think okay so let's move on to number three so I'd like you to look at that data set and you can access it by clicking here and you'll see the data set or you can go to your modules and click there if that doesn't work for you so identify um list the different types of plastic so um so I'm a little confused by this question because um when I look at the data set um get it now so here I'm looking at the data set and um these are different types of plastics so and you can look at the code in the in the sheet pet stands for polyester PP stands for um polypropylene PS stands for polystyrene and PVC stands for well I guess they just said it was PVC so if you look at the the um the little chart just before problem number two it breaks it down so in this answer when I look at the answer key it says that there are only four different types of plastic p-e-t p s and PVC but I see if we look over here oh I don't know which which uh desktop I'm sharing and being too complicated here if we look over here I also see h DP e and i d p e um and when I look at the the codes high density polyurethane and low density polyurethane so I'm going to add that in um and there are other weird headings here like this one here that says empty and when you look over you see category left empty count and I'm thinking when I used to use Excel to do your grades sometimes not your grades but previous students grades you couldn't actually do some of the calculations unless you had a whole column with zeros in it so whenever somebody breaks down their data set to you sometimes there are headings that just are confusing you don't know why and they're relics of other things so don't be intimidated by that just do the best you can so I'm going to do the best I can and I'm just going to cover my bases here by typing in um so when I'm now back over here I'm going to type in four types of plastic but I also see two others HDPE and I d p e um hi and low density plastic so that cups those that's my answer so this is me kind of covering my basis because I don't have the background to fully understand the data that completely understand the data set but that doesn't mean I can't I can't go through so and I'm since the answer key says there's only four types okay I'm going to go with that but I see two others and I'm thinking maybe those are broad overarching types perhaps I don't know but this whatever you're in a situation like this with your teacher where you're like I don't know if she wants this or this just give me both things and then it shows that you've done your due diligence the point of this question is to have you um really look at the headings and make sure you understand what headings are which the main thing is to look at the parent company and that will um that's what uh really reveals um what's going on here so so the first data set that we're looking at and I'm going to come back up here the first data set we're looking at right I'm waiving my cursor here this data set is it was comprised by thousands of volunteers who went through and picked up plastic and categorized it in Pretty in in many countries I can't remember how many countries they said um I think it was in the 50s all right well it's not there but you can look it up um so um so that's problem number two and what we're gonna do now is we're gonna use so for problem number four um we want you to look at this dashboard so somebody and her name is Sarah took this this large data set of the the brand audit of all the different types of plastics that were found and within a short period of time across the globe and she made um and do you notice here where it says shiny apps she used the same program that is driving our data center um website so she so click on there cut and paste whatever you need to do to get that bumper and I will put that link also in your um in your module in case you cannot click and paste click so I'm going to stop sharing here and I'm gonna I clicked on it it appeared in the other desktop that I have I realize you guys may only have one little screen so here is the bumper and what it's doing is it's taking that large data set and it's it's called a dashboard and it's manipulating things very quickly very much like we do in the Dana Center um the dcmp shining tools that we've been using all semester long so what I'm going to do is it um in problem number four it said um select a country that interests you in the drop down menu well you can select whatever country you want I am going to select the United States because that's really the only country that we have control over in terms of legislature so as you can see there are two different years for this particular data set if you want if you're more interested in the break free from plastic movement there is information at the end of that video that I showed you so here it is and it it's got um the total recorded Plastics items on a particular day when all the volunteers went out to record them and in 2019 there were a whole lot less Plastics than in 2020. now why that is could be lots of reasons could be the pandemic maybe people had more time on their hands but it also could be that we are um we're just drowning in more and more plastic every year um but what's interesting is up here uh who were the Great offenders so the crow Kroger company I don't know what that company is but I bet if you Googled it you would find that it's lovely uh products that you love and oftentimes companies that have a very friendly front-facing view Pub PR view um have some sort of uh more and more uh innocuous uh name that you don't recognize for when they're being unpopular so the Krueger company so go ahead and Google that if you want they were the top offenders PepsiCo um and then here is the Coca-Cola company and the Coca-Cola company was the third offender in the United States but it's a top offender in globally and it was identified as you saw in that video as one of the major offenders so if you ever wanted to do a term paper on something else having to do with plastic pollution this would be a great resource to start from it's just Google Break Free from plastic you've got this dashboard you've got all kinds of things and you've got the main data set linked here so for because I picked the United States um so keep keep that I think I'll just um drag this over here so your answer might be different but what kind of uh what are the total number of plastics recorded in 2019-2020 from that country so the totals for 2019 are right here um four thousand three hundred and fourteen um it's a 19. 4 000 319 items so of course it's not all the items this is a sample this is not a population and then in 2020 um nine thousand almost ten thousand nine thousand nine hundred and fifty Seven items and they all different types of plastics and so on um and what are the total number of plastics reported okay so there that just shows me that you've accessed the um the dash it's called a dashboard and it's a way of manipulating a different data sets some computer programmer did it all that work for us um and what are the companies reported to be the top polluters in 2019 or 2020. so um two I'm just gonna do the answer it says or so that means I can um so you may have picked a company that doesn't have one you may have picked a country that doesn't have one particular top polluter and you can say that but I'm gonna just do 2020. um the Kroger company and I really would love to look that up um PepsiCo I wonder if that's Pepsi it probably is and Coca-Cola foreign so you do you pick a country that's interesting to you um maybe it's the U.S maybe it's a different one trollsy don't let that so just love that okay um so those are my answers for that and so what we're gonna what we have here for problem number five through eight I have taken you through steps for if you would like to manipulate this data from this large data set so it has countries it has it's a very large data set and you can see all the numbers of volunteers and the Dumber of days that they organize so seems like Argentina is really on the ball and um they had 24 events um so um you could do all of this um it starts here if there's time but if we're short on time and it takes you through the steps on how to do that and you could do that especially if you wanted to do a different term paper uh more in-depth term paper where you search the data sets on different questions but what I've done um is I have um if you're short on time I already went through the steps identified here to create the data set that focuses on a major polluter that comes out of the United States and that's the Coca-Cola company so I have it if you notice here this is the you may be able to cut and paste directly from the document if not you can cut and paste here if not you can go to um your course module and the link is there and it's the second data set which has the word Coca-Cola in it so I am going to um call that one up see if it will work right now bam so there it is um and so what this is this is only um 50 items if you count them it has the original rows here but it's only 50 items that were all the Coca-Cola um and it condenses it by country so um it's manipulated the data it's kind of condensed it nicely so we're going to be using this data set and we're going to be using this data set to cut and paste um into our beautiful Dana Center that we love so um uh so we're right here um we are interested in answering the following research question so you're going to need to really focus on This research question throughout this activity so what happened is that Coca-Cola came out with the public statement saying that they had their claim is that during this uh they only have this many items that were found in 2000 uh the average for every country the average number of items found during the audit was about 2 000 or maybe some other source maybe not the break free from plastic audit but they came out with a PR statement saying that in every country the number of items so let's read it um for a product for for the products reported by the Coca-Cola Company is the average total plastic count found in various companies in 2020 different from the claim that uh uh 275 items um were found for the Coca-Cola company so the Coca-Cola company is saying it's only 2 275. the break free from plastic people think that it's different than 2005. to 2075 that's what they're wondering so um for problem number see you've got the data set here based on the research question so for problem a based on the research question um are we interested in testing a proportion or a mean so um so just that's always your first question when you're entering hypothesis testing is is this proportion or does this mean so um read the problem again and then decide what you think okay so because it says average uh total Plastics count found in various countries that averages mean they mean the same thing so the answer here is mean and so the key words are well the key word is average another tip off is also all of the raw data is numerical or quantitative if you look at the if you look at the we're looking at right here the grand totals of the different types and um so for Argentina there were 3268 quite a far cry from 275 which the cocoa now but when you go down to the next country um oh that was in Argentina um Argentina is 44. um but then I don't know what country this is here Korea is that right uh Kenya so big difference I don't know why but it's interesting the key words for deciding whether or not you're dealing with a proportion or a mean well it's the data is quantitative so we're counting the number of items as opposed so how many did Kenya have how many did did Germany have how many did you go by and the answers are numbers the number of items that were found during that audit if it was more like do you have uh what's the how many people um do you have blue eyes or do you have green eyes uh is it What proportion of the um huh of the items come from Coca-Cola then it's then when proportion percent or that the raw data is uh qualitative or categorical that's a tip-off that you're doing a proportion test but you're not you're doing we're doing an average test okay um so what is so now we're going to for prop for Part B here what distribution do you think we should use for hypothesis testing um that would answer this question so the two distributions you really have to choose from are the normal distribution or the T distribution so um at this point I want to switch to um to actually drawing a picture of this so we will sharing screen and then we'll share a screen here okay yay all right so so for problem number oh that kind of gave it away um so we are dealing with means and um what distribution do you think would be used to um in the hypothesis test for this answer so your choices are the normal distribution or the T distribution and remember there's not just one t distribution there is actually many T distribution dear many T distributions a different T distribution for every sample size so um so it's actually the normal distribution is if you know if you are given and so you have some number here for your average and you would have some standard deviation and it would be known that would be the normal distribution the T distribution it's going to look a little bit different how's it going it's good I'm recording I'll be just give me a moment and I'll stop so the T distribution and specifically it's t this is the degrees of freedom and if you remember degrees of freedom are n minus one if you look at the data set and you count them the rows are 50 so this would be t49 and so that would be a different number and so if if we standardize things to do our test we would then um when you standardize the centers are always zero when you standardize and the spread for this one uh the spread for your test statistic would be one so he would go out one in both directions and we would call this normal zero comma one so it always lists the center first and the spread second but here whatever the standard deviation is it's bigger than one um and we use this distribution if news this distribution the T distribution if Sigma the original um spread of your original Parent distribution is not known now the reality is that the data set we're looking at is a sample so when we look and we get a spread distribution it's going to be X since the data set we're looking at is not the population of use if we knew every single piece of plastic that was thrown away in 2020 and we're able to count it all up and figure out the numbers for every we don't know that and it's unknown is not the population we don't know the true Sigma X we don't know that so will use t um 49 instead of n 0 1. so we're not going to use this one because we didn't know don't we don't know we don't know Sigma X so we don't we don't have enough information so this is the one and by not knowing Sigma we have more variability um and we have to acknowledge that we're not just estimating what the true mean is we're also estimating what the true standard deviation is um and for all sample means for all hypothesis testing note for all um mean hypothesis testing we will use T distribution instead of an 0 1. so all you need to do is read the problem on the exam and identify am I dealing with means means or am I dealing with proportions if you're dealing with proportions assume here assume Sigma equals the square root of P 1 minus p over M so we have a nice formula for that um we don't really have a nice formula for um what the uh the spread of Sigma x's and so we can't calculate the spread of a sigma Sigma X bar okay so so if I say mean you know you're dealing with T it's a given okay so let's refresh our memory on T distribution um when you're dealing with the T distribution you would have your test statistic is going to be a t-test statistic and it's observation so some sort of average uh minus this the center of the hypothesize and I'm going to put a little not here because it's what the powers that b say is the true average um and then you're going to divide it by well you're you're not looking at individual countries how many Plastics products they have you're taking an average of many many um maybe you're looking at 10 or 20 or 30 or 40 countries at a time so your spread is not Sigma that's Sigma X it's Sigma X bar and we learned in a previous class that that is calculated by the sample standard deviation chop down chop down divided by the square root of the sample size um so the bigger your sample the smaller your standard deviation so there we go that's and the conditions are um well one of the this is you only are sure that it has it settles down to this shape to this shape as long as either n is big enough sample size is large enough and this is the threshold or um n can be tiny and small but only if the original data comes from a normal distribution so if you know that the original data the individual counts follow a nice normal distribution then you can be sure then you can take a really small you don't have to do you don't have to work as hard so um so what are the conditions well same old same old conditions um for us to assume that we can do you that we can do it it's called a one sample t-test so your test statistic right here comes from the T distribution you can only be sure of that if your original data your original samples are random and that you follow all of the good practices that we've studied at the first part of the chat of of the semester on getting good samples and so either your samples are random or some for some reason you can feel confident that your samples are representative of the population that you're trying to understand um so the samples are random or representative of the population so what's the best way you know that it's you've achieved that because you're looking at data from a reputable source and reputable sources not Google Google Scholars a little better but reputable sources are like the sources that I've used throughout this uh semester notice I don't use Fox News and I don't even use CNN which of course you know which one I read but I want to make sure that the data that's put out there is fair and balanced and um and that it was you know there isn't an agenda um necessary well clearly the people who did this organization the break free from plastic clearly they have an agenda so that's a little bit of a misspeak but they are a reputable I know that they are not going to lie about their data because they don't need to the data is so overwhelming so that's the first condition the second condition is that the sample size is big enough and so we have it in the little box here that we want to have the threshold now for proportions it's NP is greater than 10 and N1 minus p is greater than 10. so the probably the the value in your head is ten ten ten ten successes intent failures that is not appropriate for the one sample t-test for the one sample t-test the threshold is that you want to have usually a sample size of 30 or more is considered to be large that's so 30 or more is the rule of thumb and you could see that your distribution settles down at 30 or more but sometimes oftentimes scientists will do less than 30 if or or less than 30 if your original distribution so I'll just highlight this in a different color um less than 30 if your distribution is approximately symmetric I would say approximately normal um and how do you know how would you know that what you should do to begin with is you should look at the Dot Plot the box plot or the histogram to see what the shape is so that's exactly what we're going to do right now um so um what I'm gonna do let's create a histogram um so uh I'm gonna switch back so let's stop sharing and I'm gonna share another desktop because for whatever reason when I try to cut and paste a column of data from this uh Plastics this this data set that I created um it's it's it's somehow speaking to the parent data set so when I do it on the iPad I end up getting thousands of terms even though I can see that I'm cutting pacing 50. I've been able to figure out that problem so we're going to do we're going to switch back to um this desktop and I want to um the directions say so I'm looking at number six and um number six says create a histogram um of the grand totals so the column I want for grand total is right here so I'm going to just highlight that column created so I'm reading number six uh create a histogram of the grand totals of all the Plastics found in various countries in 2020 um and the the you can go to the link in question five or you can yeah so there's question five so we're going to be looking at this link right here and I have it oh you can't see that so um it's the um either link actually works um Coca-Cola data and if I can take this so question five all the order has changed on this because I've been tightening in it so um you can maybe click right on here if that doesn't work you can cut and paste this if that doesn't work go to your modules and there's a live link for sure in your modules and it's the Coca-Cola data that we're interested in um so you're going to cut and paste so what it says is um number six copying and pasting the correct column into the correct ECM tool so notice I'm not actually giving you the link of the DCM tool because you need to know that you want to create a histogram of this data at this point in the semester you need to know how to do that because that's going to be on the final create a histogram here's some data so I would like you to create a histogram so I'm so pause and do that take this cut and paste this column into the appropriate Dana Center tool so we're gonna here's all of the you should have this book marked and this is actually the very first thing you learn in the class so go ahead and do that pause and come back okay so um I am going to be uh it's I remember that I created histograms in this part and this is categorical data this is going to be uh if I want to do bar charts or pie graphs that's going to be um uh categorical slash qualitative data not numbers this next one is the quantitative and these we we didn't really use this one we didn't use this one you don't need anymore and this one right here was to get you used to uh how the binomial distribution works so really for the final you need to be comfortable with these two so I'm going to click this one and it's quantitative data so we're going to do that and uh this is not from our textbook it's on our own but it's and so it's individual observation That's Not Afraid these are all individual observations so I'm literally going to copy and paste so copy and paste here Ctrl V for me because I'm an apple person and it seems to know that this is not uh correct so but I like to delete it anyway it just bothers me that there's numbers in there so um here's my histogram so if you are using an iPad and you cut and paste and you got something that was more that was thousands instead of 50 I would suggest you go to it a computer like I'm doing and do it on that instead of your iPad um so it's giving me the histogram and the box plot why not do the Dot Plot just for the heck of it too um The Dot Plot so my dots are too big uh so I'll make them just a little bit smaller so each one is it says that it's 250. I'm gonna bring it down oh that's too small now foreign this is a pretty small data set so I think um that's pretty helpful and I can see how the Dot Plot and the histogram are pretty similar and if I hover will it tell me what that is no but it will here um oh no it doesn't there's just one observation that is between four thousand and four thousand five hundred there's one observation that is between three thousand three thousand and five hundred most of the observations are within here um and remember for data sets for histograms you can't tell where that is so you know it's possible that Coca-Cola was telling the truth when they said that the true average was 275 but if we go up here to our descriptive statistics we have the actual sample mean and Sample standard deviation of individual observations so we've got that there so um so the question is and I'm going to just pull this out of here so we can see what we're doing um describe describe the shape and spread of the distribution distribution histogram um so I would say so go ahead and answer that question note any random samples of countries but note that a random sample of countries was taken each year and the actual countries included vary from year to year so this is a volunteer organization so it's not the data collection isn't perfect but we can certainly see something going on there and I think this is for 2000 and if I look here I think we gathered for 2020 only yeah so this is only 2020. um so let's get this up so what does the distribution look like well shape skewed heavily to the right okay foreign skewed heavily to the right uh spread so we want to talk about the spread too um so for spread you could really the standard deviation is kind of a scary thing to use when your data is skewed so heavily so I'm going to give both of them because I know I'll need the standard deviation for my hypothesis testing but by then I will be looking at I'll be not looking at into this right now this is the individual raw data so we're going to have to look at samples that are quite big you know this is the original doll data so uh spread well um I could say the standard deviation s uh s sample standard deviation of individual countries is um 720 6 items so it's saying better to use range since the data is so skewed so I'm going to say it's between zero well if I look at the um if I look up here I can see the minimum was one country only found one Coca-Cola product I'd love to know who that was maybe it's more than one country actually it probably is more than one country it looks like it's all those countries so between one and these two seem like outliers so I'm going to say most observations are between zero and a thousand most uh values are between zero and one thousand items found for Coca-Cola with two outliers two extreme outliers at so I could use the histogram but it doesn't it'll give me the range or I could use the Dot Plot it's a little more precise so I would say around 3400 it's a guess so I'll say not at I'll say around to indicate that I'm around three thousand four hundred and let's say four thousand one hundred accidentally just to let the reader know that I don't I'm not looking at the data I'm looking at the histogram I could I don't think it'll tell me the actual value no it doesn't tell me the actual value right here doesn't seem like that feature is turned on so we've got two outliers here we can see for sure they're outliers wonder if that would tell oh look at that 4068 and the other one it's not telling me because so that was pretty good or a thousand why not be as good as we can be 68 still approximately with the other one um now it looks like it's more like 3 300 but we have other outliers so I'm just listing the two extreme outliers but there are more outliers here one two three four five five outliers out of 50. it's pretty significant um so some of these are would work out to be too far away from the pack okay but I think I have done more than enough to answer this question note that the random samples students describe you know okay so I think I answered that probably to death um so um you might want to draw a sketch of that just to remind yourself because the point of these notes is to help you remember what the lesson was okay um so the next question asks you to verify um the conditions verify that the conditions are met for the one sample T it's called one sample T because we're comparing one sample we've got 50 observations and um we're going to look at the sample mean of that and compare it to what Coca-Cola says it's the truth so Coca-Cola Gets to say they get to set the stage they get to set what is perceived to be the correct answer which is 275 and we are testing is that right was it 275 someone in this geek I'm going back to the research question problem number five yeah they say their average is 275. we're now looking at the data to see if they're actually being accurate about that so verify the conditions so um and it's broken down here part A are the samples random and Part B is uh is it normal or is the sample size uh uh is the distribution approximately normal or is the sample size for the sample from the population large um so pause and answer so hunt back and look and see um and pause and come back um so if you read it we were actually told in the setup we were told that the samples were random by the organized Asian oh that um organized this brand audit break free from plastic so you can choose whether or not you trust them I trust them I trust this or canonizations word that the samples were random or represent the larger the population of plastics Okay so and specifically um I trust uh can you hold on just one second okay sorry about that interruption um let me go back to here okay so I looking at this website and um so I'm going to turn this blue so you can see my answers more clearly so we were told the samples were randomly we're random by the organization like break free from plastic I personally trust this organization it's been published in several uh several articles that are reputable peer-reviewed um but maybe maybe you don't but I do and I trust them more than what Coca-Cola would say so it's all relative um so so that's how I know about uh that the samples were connect were collected in an ethical way and then the next question is um the next thing you're going to want to check is is the sample size large enough well so this is your distribution of your this is your best guess of what the population would look like is what this looks like so the original population is heavily skewed and has some outliers so we're going to hope that a sample size of 50 because remember this is a sample size of 50. our sample size is an equals 50. we are going to hope that is large enough it meets the threshold the recommended rule of thumb of 30. or a 2 2 a 4A T Test a one sample t-test because we're only looking at one sample and comparing it to a known constant what Coca-Cola says is the truth uh but I'm going to say that I actually given that there are these outliers these two outliers here and there's actually a bunch more here and we can see there's at least one two these could be overlapping two one two three four five I'm just gonna say honestly honestly I wish the sample were a bit bigger foreign because the fact that we know that there's 10 of our data at least uh one two three four five five out of 50 of our data points or outliers um it concerns me um I would want a bigger sample size to make it more stable but technically it's it meets a threshold okay write out the null and Alternate hypotheses that would be used to answer the research question in number five remember to use correct notation so this is prime of tricky because I'm encountering problems in canvas and getting you to be able to type in the correct notation it's just not available in canvas so instead it's been multiple choice on the final exam I might have paper and pencil for this to just kind of bypass that so be ready to answer questions with a blank like this as opposed to what you see in your canvas homework so um I'm going to switch to handwriting so we will either we'll go to my beautiful iPad um hopefully there we go all right so we are now on problem number eight um right the null and Alternate hypothesis that would that would be used to answer the research questions stated in question five remember to use the correct notation so we know H naught is always going to be letter excuse me um letter and it's going to be Greek letter symbol let's see remember so you may have that and H A is going to be very similar um letter symbol number so I'm going to say same letter and same number and for this it's going to be either less greater than less than or not equal to it's going to be one of those um we know that this symbol right here is always equal and we have to get this the symbol down below from the context of the problem so that's got to be context this has to be context this has to be context so I'm going to go back to number five because it's always good to be careful if you don't get your question right so here's our research question um for products reported by Coca-Cola is the average oh bing bing bing you already knew that we already identified it but it's good to just be thorough about it so our symbol is going to be mu and mu and I know I'm using you in canvas because I cannot get uh canvas to write this Greek letter which is actually m so my bad but I was desperate um and then um so we've got average for the various countries oh look at that this is the research they're wondering if it's different now I know that a smart logical thing would be to think that Coca-Cola is lying and it's actually way more plastic but the fact that it said different from means this is always the research question the research this is what the researcher what the researcher the research question really question is wondering so they said different and so the other the other always is equality and the number that Coca-Cola threw out there um and I'm suspicious of it they're saying it's a parameter is 275. and we're wondering if it's not 275 so I think it's a good idea to Define where so also clearly Define the parameter um listed in your hypothesis so I'll pick where new equals the average or mean number of plastic items per country or originating from the Coca-Cola company so if we wanted to know what percent was Coca-Cola then we would be doing in one sample proportion test of proportions and that would be a z task because we wouldn't we would have a better way of calculating the standard deviation of the distribution of p-hats but we're not there all right um number so that was number eight um number nine what if we are interested in answering the following question using the sample so using the same data um so here is a different research question so I will pick a very different color let's pick lime green no oops don't want to obliterate it foreign question um products reported by Coca-Cola company an average total Plastics is less than the average calculated in 2019. is this a one sample or two sample test so we're comparing what happened in 2019 versus what happened in 2020. and we're wondering H naught well H naught's always about equality so I'm going to go right to this is a research question so I'm going to go to h a and let's write that one first um is the average total of plastics found in various countries in 2020 so I'll say mu 2020. um is it less than the average found in 2019. so I can pick any subscript I want so that is going to be that would be the research question um and clearly they didn't oh we do want you to write it oh my bad I should this should go down here so H naught we haven't gotten to that one yet h a we're wondering if new 2020 is less than mu 2019. um and so so the the research question is always the alternate hypothesis um so we're going to compare that to what's always always always what do we always say for AJ for H naught equals so everything else has to be the same um 2020 versus New Year 2019. now I think actually it was the other way around if we looked at the data there's no way this is going to pass but as you can see you're comparing if we were going to look at this we would have a data column column um uh of members from 2020 and we have a column of numbers from 2019 so we'd have numbers numbers we'd have a whole column of data and what we do is we would take all we grab all these and we calculate new 2020. uh nope we would calculate X bar 2020 and we do the same thing here so this is data data data so we would add up all these numbers and divide by 50 and we would get X bar 2019 and we would be comparing so we are definitely this is sample one and this is sample two so we are definitely not in a one sample we're in a two sample test and it would be a two sample t-test and we'll talk about that in an upcoming section so it's that uh write the null and Alternate hypothesis that would be answered Define the parameters of Interest using the correct notation oh they want mu 1 and mu2 all right um so we got to follow the rules so new one represents 2019 okay so we'll change that to continue one and they want 2019 and they want mu2 to be two thousand foreign population mean for 2020. so gotta follow the rules I guess we can't do so we're wondering if let me make sure to define null and hypothesis to answer this question you make sure average is familiar is going to is less than the average is that Boston yep that's what they want us to say even though it's kind of looking at the data that's going to be ridiculous so in order to do that I have to make sure these have to be identical or you've really screwed things up so you know two versus new one so there we go um Define the parameters okay so where where new one equals all the average of all the average oh I didn't do this number of plastic items items so I've got to make sure that I've defined the parameter I just did average number but of what of plastic items from where so I've got to do my population of interest for for all um Coca-Cola Plastics items collected from all countries in and we're doing U1 so new one is 2019. yeah in 2019. so where mu 1 is the average number of plastic items um taken for calculated from from all Coca-Cola plastic items collected from all countries in 2019 so per country I'm just gonna that's Overkill per country okay it's not beautiful but I sure have explained to my teacher that I know what I'm doing that I'm looking not at all Plastics but all Plastics originating from Coca-Cola and I'm looking at per country and taking an average of all of them so do the sink so if I had time on a test I might make this a little better but it's good enough it's good enough and that's what statistics is all about being good enough not perfect and so Mewtwo equals the true I'll make this one a little better true population average [Music] of all Coca-Cola Plastics elastic items as trash um per country or protection for every country in the entire world uh we're not going to be able to get that information we're never going to get that so we're going to look at samples and we were looking at samples of we were looking at a sample of 50. and it gives us a good s a good picture of what's going on in the world and I think we're going to find that Coca-Cola wasn't being honest um so anyway okay we're done um I we'll see you in the next video