hello welcome back in class activity 1d we're looking at bad drivers and we're using the article from one of my favorite uh data scientist journalists mona oh i should shalabi i think is how you pronounce her last name she's an amazing um journalist she also does a lot of very interesting it is art but it's really graphics of data and it's quite shocking um really shocking material it really gets your attention so if you're bored you should google her and and then ask for images because uh she's very creative and she makes data very interesting um so she gave us an article um and let me share screen so that we can actually have a visual here come on i'm coming cool all right so um i love this journalist data scientist she's really creative and um so you can she's on lots of different websites i think she works for the guardian maybe the new yorker but i also want to draw your attention to the the website that hosted that article and uh 538 it's an it's a great website a lot of times it's hard to find data that goes along with the articles the articles go on and on and on and give you broad statements but they don't actually give you the data and this is one of the rare places that gives you data so if you have to do a project in a different class where they want you to have data um this is a good place to go so the article in question was from something called dear mona and [Music] see if i can get that up and you would have read that already so i hope we're here i hope you're seeing this website here i can't ask you that but i think you are uh the dear mona um and the question that was being posed is which states have the worst drivers and the question on the worksheet is asking you to focus just on that question um i do want to say i do want to show you that she didn't give a really she started with this is a tricky one that's actually a good thing in statistics we want questions where there's not a clear-cut exact answer because otherwise it's not interesting you don't need to employ a statistician to answer the question on average how many toes do people have on their feet don't want to do a research paper on that more interesting do people now have good access to health care that is a broad overarching question open to interpretation what does good health care mean does it mean that if you have an emergency you go to the emergency room and you're taken care of or is it i think a better marker for whether or not people really have access to health care is how many times a year do they go to the dentist how many times a year do they see their eye doctor to see if they need glasses those are the survey questions the little questions that will inform the broader question that research question and in this activity in this section 1d when they say is something a good statistical question they're asking about that broad research question of investigation you know i have trouble saying that so mona starts with this is a tricky one and then she breaks it down and she focuses on fatalities which i think is an excellent thing to do i mean what's a better marker of someone being a terrible driver is how many people kill other people i think it's excellent she gives us some visuals which i love it looks like distracted is not a a lot of people are not distracted it might be self-reporting drivers involved in fatal collisions who had not been involved in previous accidents i find this to be quite worrisome because if you look at the data sheet most people who have fatal accidents were not in previous accidents so um you know it's like you and me and then all of a sudden our lives are over so please be careful when you're driving i'm gonna skip to the end though i do this one's an interesting they're all interesting but i want to skip to the end and her closing i'm sorry there's no easy answer here lisa that's who asked the question the number of crashes even the fatal ones just isn't a clear-cut way to understand who is and who isn't a bad driver well as a researcher you get to define what it means to be a bad driver and i actually am going to say i like the definition you your state's that bad at driving if you kill more if you have more fatal accidents i i'm gonna stick with that she switches gears to look at insurance providers and i'm a little bit suspicious of that because i suspect that fatal accidents are not as expensive as accidents where people are hurt but they're not killed so i never trust them it following the money doesn't always get you to the truth uh so you be the people who decide what it means to be bad at something don't follow the corporate message all the time so let's switch out of the the article and let's go to our our worksheet stop sharing and i'm going to share the worksheet and now instead of having the nifty visuals we have instead the beautiful raw data but before we get into that i now that i want you to answer these questions is the three-part question what made the question a good statistical question and the question is which u.s state has the worst drivers so there's the question why is that a good statistical question and after that do you think she answered it well did you like her approach at so if it is a good question how did she go about answering it well to actually answer that broad question you better come up with good survey questions so and then the last thing is in particular where the data used the right data for this question well she is an excellent journalist so i'm going to say yes but you want to defend your answer so pop so answer the green blue and gray parts of this question pause and then come back when you're feeling good about your answer and then i'll give you my answers and of course your answers don't have to be exactly like mine and i'd like to actually see evidence of two different answers for each of these okay welcome back so why is this question a good statistical question well in the preview activity they made it really clear what the characteristics of a good statistical question were and this is the broad research question and the first characteristic the first question is is there an exact answer and maybe in math that's good but in statistics it's not so this is a good it's actually great statistical question and when you're answering questions on a quiz or in the term you do want to use statistical language so i want to hear you echo back the language that we're covering so this is a good statistical question because there are are no exact answers kind of confusing kind of counter-intuitive we don't want things to be cut and dry we don't want we want it we want even the question to be open to interpretation instead the question and answers are open to interpretation so um we'll get to what that will give some solid examples of that in a minute another characteristic is that the responses so good question good question because the responses um anticipate variability and i'm going to be more specific than the preview activity and say the responses to survey questions and are likely to have variation and what the language in that in the preview activity was that the question the investigative question anticipates variability i think that's a little you're like well wait a minute you know if you ask me how old i am it's a precise answer well yours is yours that's a survey question and your answer will vary from another person and another person and another person so you come up with good survey questions that are very clear and precise but there's variation of variability from one in this case from one state to another to the next so one of the survey questions that mona came up with was um how many fatalities um per i don't know if it was a hundred thousand a hundred billion people or are or miles i don't know exactly what it was but she had a rate for fatality for a fatal accidents and i think that's that that's every state will have a different rate or there will be natural variation from state to state to state but the answers within the state is pretty clear okay so the response to the survey questions are likely to have variation if there's no if they all have the same answer there's no point in studying and wondering what the truth is um and another one is the question the broad question is relevant and or interesting so what's on my mind these days is access to health care that's those questions are relevant and interesting to me because of govid and because i can see all the inequity right in front of my face so but when if we're focusing on this article on which state [Music] has the worst drivers as a parent it's i i didn't care about i was like i'll be a good driver and that's all i need to worry about but now that i have kids i think about it all the time so if it's not relevant to you now it will be and it's relevant to a lot of people so it's all the marks there okay so part b do you think she answered the question well well i'm going to say yes she did answer the question well she [Music] answered the broad research question research which state has the worst drivers well because she took the time to define best drive a worse driver and i liked her first step at worst drivers being um let's focus on fatal collisions those are the worst um and she followed it up with um specific specific survey questions that remained focused on the observational units what were the observational units oh observational units are the things that are being measured or closely looked at in the study so in this case are the observational units the people who are driving the cars or is it the states within the united states well if you go back up here which u.s state has the worst drivers so the focus of her survey questions she focused on the states she didn't focus on the people within the state she focused on the overall rates the fatality rates the distraction rates the speeding rates of the entire state and she also so i think she did a great job that way and the last question in particular were the data she used the right data to answer the question so there's kind of some bleed over there i think because she focused her questions made sense that it made sense that they were related to bad drivers it was related to her definition of bad drivers and she also made sure to look at all the states so for part c i'm gonna say um it's the right data because it focused on the states and it looked at all the states so good data questions for best [Music] states and she made sure all states were represented well i think you know you're done with the question when there's no more room on the paper so i'm gonna just uh i'm gonna stop there you may have thought of some other things and that would be awesome i'm sure i didn't hit everything and i'm sure that i hope between the two and if your answers are different than mine that's all the better um so let's see what the overview is of so what we're doing today is um we're looking at the data data must be collected with purpose well what drives that is having a good question so that the data are appropriate to answer the question of interest so you've got to be very peaceful in how you create your question and how you collect the data and that's what we're focused on a little we're introducing to it you two in this section so you'll be able to determine if a question is a good statistical question so to be a good statistical question we want it to hit these three points right here it's gotta hit those three points and the ones that are going to be discussed in are more this because this isn't this could be an opinion um we can agree to disagree on whether something's interesting or relevant so we're going to focus on these as the characteristic to determine if it's a good statistical question determine whether a question can be answered with a given data so we have a good example here i'm going to make up some bad examples to show you the alternate and construct statistical questions that can be answered with the given data set this one's going to be a little tricky because i plan for you guys to do group work and compare your your questions and that would make it interesting but since this is just a one-way dialogue if you're not sure come to the study session share your statistical questions and we'll tell you what's great about them we'll have a group discussion what's great about your question and when areas of growth are problematic areas because when you're learning that's always going to happen that shows you're growing and you're learning if you make mistakes so um so come have us pick your question apart so here's the data um and it looks intimidating i even even after all these years when i am confront confronted with the data set i'm often intimidated because the um the uh variables the q are are are scary that so let me find that one um now this is a scary variable because of what it is but i mean i get intimidated per per snow previous it's like that's not user friendly but if you read it percent of drivers involved in fatal collisions who had not been involved in previous accidents so that is actually really scary this tells me that in alabama 80 of the people who are in fatal car accidents it was their first accident how's california doing because that's how does california compare to alabama on that one just about the same and we have trouble floyd yeah 89 percent of us californians who got in fatal accidents in whatever year this was had never been in an accident before that's scary so please please be careful when you're driving especially near the holidays um and um so that's one right there so you want whenever you're looking at a data set you really want to um take a moment to understand uh what so insurance premiums average combined car insurance premium loss and then there's a losses one so this is the insurance premiums so how are we doing how's california i bet it's high it's right here i think this is us so um so that i guess they're not saying if that's annual i have no idea i hope it is so average californians pay almost a thousand dollars but under a thousand dollars in um their car insurance um who's the worst that's what i'm interested in worst worst worst is it going to be louisiana is the worst so you could say that louisiana has the worst drivers but i think if you're super bad and you actually i think you're you're if you die in a car accident your insurance rates don't go up so you know maybe these are fender benders i don't know i would need more information on that but i'm not going to use that me personally i'm not going to use that as my metric for divine defining whether or not people are bad drivers because the insurance companies have all other ways that all other things that they're thinking about okay so get to know this data set uh don't be intimidated take the time whenever you're doing research to just calm down and read the headers and they're not going to be as bad as you think um okay so give another example of a statistical question you could answer um given the data that were used in the worst driver article so i think i'm going to say this is that data let's do this set yeah so i don't think you saw this in preview activity um so i'm gonna when i'm answering this question i'm gonna be looking at the data set that we just went over right here so i'm going to first start by saying what i'm interested in so what i am interested in what i am interested in because i have i have adult young adult children now and i want to know is there a relationship between how old somebody is and the chances of and and the age of dry the how old people are and the chances they're going to be in a fatal car accident i i think about it all the time so what um is there an association [Music] between the age of a driver and the likelihood of getting in a fatal accident i'm really interested in that because my kids are driving they're in their 20s still so it worries me but um another question that i'm interested in is do colder states have more fatalities some colors blend together colder states have more fatalities so both these questions are new questions i'm not worried about whether or not they're good drivers or bad drivers and i'm just wondering if older states have more fatalities um so are these good questions for so the question here is give an example of another statistics question you could answer given the data set that were used in nurse drivers i'm going to get out my red pen here i can't ask that question and the problem with that question is it's focusing there well there's lots of problems with this one it focuses focus is on people not states so i could change my question i could say is there an association between the average age of a driver in a state overall and the number of fatalities per 100 000 or for a certain amount so i could massage it but even then there's a problem there's a big problem and so i could put it back on the states but the other big problem we oops we weren't given info about age of drivers so no you can't that while i'm interested in it's relevant and interesting might be a good statistical question but not for this data set and similarly for this one uh this is better so this is not good this is a little better the next one but what's the problem here so just gently put a line through it the problem here is no info on temperature that's not in this data set there's nothing about temperature so it doesn't work so let's come up with some good ones and i bet you have come up with some um so looking at the data that's there if i'm asking i i like to do i like to focus on the interaction between two different variables so i'm going to say is there an association between alcohol rates and premium rates so that's is there an association between alcohol rates and it was alcohol so i bet i got to actually tie it to what they said percent of drivers involved in fatal accidents who were alcohol impaired so i'm going to say alcohol fatality rates and average or i'll just say i don't have to be that specific and insurance rates for states keep the focus on the states so i've got all the data there i can do i you'll learn techniques so this is just getting the question later you'll learn techniques to measure and i my hunch is that the more alcohol fatality rates there are the higher the insurance is in that state because if there's fatalities there's probably it probably goes hand in hand that there's just more fender benders involving alcohol too so that's one good question and another good question let's see just for fun so this that one was is there an association next question i'm just gonna assist i'm just gonna go out on one and say if one is high is the other high as well so do greater rates of are greater rates of fatality well this is what i was wondering um do hire fatality rates lead um that lead to well and well lead to um higher insurance costs it's a it is so is that a good question well there's no exact answer we don't know the answer we can't look it up we we know it maybe for 2015 2010 but we don't know it in general for the future so it's also open to interpretation and the responses of any survey question to make it more specific are going to vary from state to state and once you have to start paying your own car insurance all of that's going to be interesting because it really got it's so expensive okay so um those are mine yours are probably different and that's awesome because we are online there's not going to be any group work sad but i might if i have time i might set up some kind of discussion board so um where are we right now if you recall that a circle make a question and then gather data we're we're actually by by thinking about survey questions then you keep going from there um we're thinking about survey questions and really bringing it home that those survey questions have to be about the objects that you're studying and not not people like if you're interested in whether or not um animal shelters are no killer kill shelters i'm always thinking about dogs um you're not going to be focusing on the rates of dog of individual dogs you're going to be focusing on the rate of that shelter versus another shelter so your objects of interest in that case would be the shelters um so we're done with this one let's see how we did um data must be collected with purpose gotta have a great question um so that the appropriate so that the data are appropriate to answer the question so i went through some good examples and some bad examples so i hope that's okay uh so here's the more concrete skills determine whether a question is a good statistical question so you want to go through these these two for sure because this one is an opinion um and determine whether a question can be answered with a given data set so really you know are you looking at the object of the right objects and are the answers understandable and are the definitions of the so while the broad question might be open to interpretation by the time you've gotten to the survey questions you want things to be more clear you want it to be very specific at that point and construct questions that can be answered with a given data set so we've done some practice on that um so now turn this tape but turn this video off and go do something nice for yourself for 5-10 minutes and then it ideally knock that practice out right afterwards and then if you are not sure about your answers go to the mat go to the math lab online or come to our study sessions our optional study sessions which are posted the times are posted in canvas okay thank you bye