Transcript for:
Fundamentals of Statistics and Analysis

hi everybody its professor Mitchell and we're starting today with chapter 1 introduction to statistics now this section has or this chapter really has a lot of vocabulary some of it I'll go through kind of fast others you know I'll slow down and spend a little bit more time this chapter has three sections and you can see them right here so we're going to start with 1 1 which is statistical and critical thinking and you'll see the next two sections in the next two videos all right so the key concept behind this section really talks about the process of doing statistics as you're gonna see on I think it's the next slide statistics is a lot more than just crunching numbers all right so if you're feeling nervous about statistics because oh I'm not a math person you know that's not I mean that's part of it but that's not even really the most important part of it okay prepare analyze and conclude so the the analyze is usually where the number crunching happens and if I had it my way everybody would have easy access to technology and we wouldn't even spend a whole lot of time talking about how to do the number crunching okay we will spend some time on that though okay oh it's actually on this slide see that statistical thinking involves critical thinking and the ability to make sense of results so one of the things that I really like about the textbook that this course is based on that I'm using these powerpoints from is that he asks a lot by he I mean the author he asks a lot of really good critical thinking questions okay statistical thinking demand so much more than the ability to execute complicated calculations that is so true all right so you're gonna hear us talking a lot about data when we talk about data I think most people when they hear data picture like big big big sets a number and yes that can be data but it doesn't even have to be numbers all right if you're doing a study on Merced college students one of the things that you might want to know is what is the gender or what are the genders of the students involved are they freshmen or sophomores where are they from so any of those things can be considered data any sort of interesting aspects of the things that you're studying okay and then every single statistics book that I've ever taught out of and there have been a few of them has their own way of defining statistics and I like this way the science of planning studies and experiments obtaining data and organizing summarizing presenting analyzing and interpreting so there's a lot that goes into into statistics you all right continuing with the vocabulary we're going to talk about a population versus a sample that is a super important thing in statistics so when you're talking about a population you're talking about everybody or everything that you're being that you're studying okay so a good example all Merced college students would be a population all right I'm trying to draw some kind of conclusion about Merced college students for example Merced college students who use the Student Success and tutorial Center do better than those who don't use it okay so that's a statement about the population now it's and they do have ways of figuring that out all right but Merced College is a pretty big population so what I might do to try to make that conclusion I'm just gonna skip down here to where it says sample is talked to a sample of students who use the Student Success and tutorial Center versus a sample of people who have it right now you have to be careful about how you choose that sample and we're going to get into that a little bit more later all right and so I'll just leave it at that you want to be careful about how you choose the sample it should be random it should not be should not be biased all right so let me give you another example let's say that the library this is not true by the way well let's say that the library has proposed cutting their hours let's just say and you're trying to gather what is Merced college students opinion on that it would not be fair of you to stand in front of the library and ask the first hundred people who come into the library how do you feel about the library proposing to cut their hours alright the library would say that's not fair the way that you're doing that survey and they would be right okay so you know you should include because I would like to take that all Merced college students use the library but that's probably not true maybe one semester you're not taking an English class or some other class that uses the library and so maybe that semester you don't really care what the library's hours are so you know the people to feel that way they should have a chance to get into the results of this okay all right so that's the kind of thing we're going to be talking about a little later a census is where you literally ask every single person in the population so I don't know about you but when I hear the word census I think of the US Census where every ten years they literally ask every single person who lives in the country every adult I guess to fill out a form with information about their family and this and that and the other thing okay all right so that's where you're asking everybody but most of the time it's not practical to do that and so we just go with samples all right so here's an example of what we're talking about in the journal article residential carbon monoxide detector failure rates in the United States that sounds like fun reading it was stated that there are 38 million carbon monoxide detectors installed in the United States when thirty of them were randomly selected and tested it was found that twelve of them failed to provide an alarm in hazardous carbon monoxide conditions so that's kind of troubling right twelve out of the thirty did not do what they're supposed to do okay so let's talk about what is the population in this scenario what's the sample the population would be all 38 million carbon monoxide detectors okay so this is the perfect example where it is not practical or even possible to figure out how many of the 38 million carbon monoxide detectors work I just can't do that so instead I to pick a random sample of 30 and by the way 30 is a very very small sample we probably do better than 30 but it is an example of a sample the sample would be the 30 carbon monoxide detectors that were selected to tested so the objective is to use the sample data as a basis for making a conclusion about the population and the bigger that you can make your sample the better and the more sound your conclusion will be ok all right so this is very small writing so for the people in the room I am going to especially I'm gonna try whoops I think I just printed my screen somewhere okay well this is probably about the best I can do and I don't want to go through every word of this this is just giving me more details about the prepare analyze conclude idea so under prepare you have things like what are the data represent what are you trying to do do the data come from a good source I see this kind of thing on social media argued all the time where did you get this data and how did you collect your sample so those are all important questions to ask when you're preparing your statistical study analyze I mentioned before is where you do your number crunching you might want to make a graph and then exploring the data are there any outliers so these would be really you know a typical you know unusual numbers think of a class where there was an exam that was really really really hard the kind of it's where somebody asks are you gonna curve this okay you know maybe the highest score anybody got was a 70 except for that one person you know who they are they've got a 98 all right so that 98 might be an outlier okay what kinds of statistics summarize the data so we're going to be talking about the mean and the standard deviation a lot the mean is the average and you probably know what that means the standard deviation we're going to talk about that later that is a measure of how spread out is the data how are they distributed and there are different terms you can use for that is there any data missing etc etc and then under your conclusion you know what does this mean basically all right and that's usually I think the hardest part analyzes the easy part really apart with the number crunching all right figuring out how to pick a good sample and how to make your conclusion those are sort of to me I think the hard parts alright so here's another example this is kind of a sad example pleasure boats and manatee fatalities from boat encounters okay so we will not spend a lot of time in this example because it's kind of depressing but this is just showing you an example of where you might want to see if there is a relationship between two sets of data so that you know one idea is that the more pleasure boats they're out the more likely it is that you know the animal the sea creatures might get hurt from the boats right so this table includes the number of registered pleasure boats in Florida in tens of thousands and the number of manatee fatalities from encounters with boats in Florida for each of several recent years and so my gas if somebody was asking me well what do you think we're gonna do with this I would guess that we're going to see is there a relationship between the number of pleasure boats and the number of fatalities all right so my guess would be that the more boats there are the more fatalities there are okay whether or not that's true we're not trying to do that now maybe we'll come back to this example later when we're ready to do that I kind of hope not okay you know maybe there will be a you know kind of more uplifting example and these data are from the Florida Department of Highway Safety and Motor Vehicles and the Florida Marine Research Institute so that seems to be a good source for where to get this data I didn't just get it off reddit or for Chanin or something like that and the data were obtained from official government records that are known to be reliable so that's a seems like we've got all our ducks in a row when it comes to sampling okay so now we get into different sorts of things that can go wrong with sampling so one way to collect a bad sample is to do what's called a voluntary response sample this is one where the respondents themselves decide whether to be included think internet polls all right don't try to draw any sort of meaningful or reliable conclusion from an internet poll all right another example of a voluntary response situation have you ever heard of a website called rate my professor voluntary response the people who are really mad and sometimes the people that are really happy are the people that you see writing stuff on rate my professor usually the people that are just you know feel kind of neutral about it usually they don't bother okay so here are some more types of polls that are common examples of voluntary response samples so you should take these with a big big grain of salt internet polls mail-in polls where people can decide whether to reply and I can't remember the last time I saw one of these no I have actually telephone call and polls in which newspaper radio or television announcements ask that you voluntarily call a special number to register your opinion I don't know if you've ever noticed a there's a channel on cable TV called c-span that's the one that covers government and every morning I think it comes on too early in the morning in California because it's broadcast from the east every morning they have a show where people call in to give political opinions so again these are people not people that feel neutral about political stuff so don't try to draw any conclusions from what the callers to c-span are saying okay so here's an example of something nightline which is a TV show asked viewers to call with their opinion about whether the UN headquarters should remain in the United States viewers then decided themselves whether to call him with their opinions and 67 percent of respondents said the UN should be moved out of the United States well compare that with the separate independent survey where 500 respondents were randomly selected and surveyed and only 38% of them said the same thing that they wanted the UN to move out of the United States on the one hand you might think well a hundred and eighty-six thousand people that's a lot of people you know it's a lot more than 500 well guess what the second one is a better indication of how people actually feel okay because the people did not get to self-select okay so which i think is what this as yeah the smaller poll of 500 respondents is more likely to provide better results because the same sampling method was better not voluntary response okay and that brings us to analyzing after completing our preparation by considering context source and sampling method we begin to analyze the data so one way to do that is by graphing and then applying statistical methods a good statistical analysis this might be good news to some of you does not require strong computational skills I mentioned to you before in a perfect world I would love someday to teach a statistics class where everybody has either a graphing calculator or a laptop with excel on it and we don't have to do any number crunching by hand we can just feed it into the computer and spend all the time talking about these other things like how to collect a good sample and how to me in conclusion because those are really the important things okay and then the final stab conclusion we should develop an ability to distinguish between statistical significance and practical significance so statistical significance is achieved and I remember it's been quite a long time since I've taught out of this particular book but one thing I remember he does is he has a really good simple definition of statistical significance that means that the likelihood of an event occurring by chance is 5% or less okay so a couple of examples getting 98 girls in 100 random births would that would be significant because that is really really really unlikely to happen by chance 98 out of 100 52 out of a hundred that could easily happen by chance that would not be considered statistically significant okay you know for example there are all these old wives tales about you know if you're trying to have a child you know making it more likely that you'll have a boy or that you'll have a girl all right so this you know we might be studying whether any of those actually work and as far as I know they don't so practical significance it is possible that some treatment or finding as effective but common sense might suggest that the treatment or finding does not make enough of a difference to justify its use so here comes an example of that ah perfect exactly what we were just talking about ProCare industries once supplied a product mean gender choice that supposedly increase the chance of a couple having a baby with the gender that they desired in the absence of any evidence of its effectiveness the product was banned by the FDA as a gross deception of the consumer so suppose this product was tested with 10,000 couples who wanted to have baby girls and the results consist of 5,200 baby girls born in 10,000 verse that would be a statistically significant result because the chance of that happening is only about point zero zero three percent okay however the fifty-two this result does not have practical significance because if you really really want to have a girl this product could just claim that they make the likelihood you'll have a girl 52% and that's not that far off from 50% and then they include this little tidbit that I didn't know in reality the likelihood of a baby being born a girl is only about forty eight point eight percent I didn't know that okay some things that can go wrong when it comes to analyzing data reaching misleading conclusions reporting sample data instead of measuring it oh so if you're for example trying to do a study of how tall people are you should actually measure their heights yourself and not ask them how tall are you right would an even better example that would be wait don't ask people how much they weigh if you really really need to know that then weigh them all right and hopefully they won't hit you loaded questions alright and this a lot of this is beyond what we're going to do in an elementary statistics class if any of you are going into social sciences maybe some of you want to be you know like psychologists or something you will take probably a class called research methods and in a class like that you talk a lot more about how do you write a good survey question even the order of questions can be important non-response you need to account for that sometimes people don't want to answer for example you ask them how much they weigh and then be careful about percentages some studies cite misleading percentages note that 100 percent of some quantity is all of it but if there are references made to percentages that exceed a hundred percent sunsh references are often not justified notice it says often there are times when percentages over a hundred percent do make sense I could say that you know I made some change in my teaching that increased my success rates by four hundred percent that is absolutely possible something increasing 400% means that you know if let's say ten people passed my class before I started doing this and 50 of them passed after I start doing it that is an increase of 400% it went up by 40 people 40 out of 10 people is 400% it does not make sense though for something to decrease 400% that doesn't make sense and it doesn't make sense to talk about having 400 percent of something all right all right and that is the end of section four one point one