Transcript for:
Basic Concepts of Statistics Introduction

hello and welcome to the first lecture in your statistics class um my name is professor jenna hirsch also dr hirsch and i am going to be here with you on your journey through an introduction to statistics um so what this is over here is um we are going to start today's lectures by just either printing out this lecture note that you have in your blackboard shell or by finding a word version of it and writing directly on your computer so i am going to read lesson one to you i'm gonna teach you information and give you vocabularies and ideas about statistics from lesson one today and as you're listening when if you printed this out then you can write in the vocabularies and notes as i'm going through this lecture or if you have a word version of it and you'd like to do it electronically on your computer you're more than welcome to do that as well so let's get started today we're going to talk about an introduction to statistics and some basic statistic vocabularies for your first lesson one of the most interesting aspects of the statistics class is that we really do use statistics in everyday life we use distinct statistics in decision making on a daily basis well how okay so one of the most important ways that i use statistics in the real world is looking at my weather app on my phone let's think about the following comment there is a 40 chance it is going to rain today okay so if you see this on your weather app are you going to bring an umbrella with you are you going to wear your rain boots or do you say to yourself well i'm not going to worry about it because it's not over 50 so the possibility that it's going to rain is fairly low and it's not high so 40 chance it's going to rain today some of you may decide to bring your umbrella and rain boots some of you may not but let's change up 40 what if we said to you there was a 70 chance it's going to rain today then do you bring your umbrella and your rain boots what about if it's a 90 chance where is that threshold where you make the decision that even though there's not a hundred percent chance even though we can't tell for sure that it's going to rain right when do we make the decision to protect ourselves from it to bring that umbrella and to bring our ra and to wear our rain boots other ways other examples of of when we make decisions statistic statistical decisions in the real world another example is a pregnancy test so you know maybe you don't know but that's okay you'll learn something new most pregnancy tests are really effective 99 now remember 99 effective means out of every 100 tests that are taken 99 of them are going to give you the correct result so 99 out of 100 means again for every 100 pregnancy test taken 99 of them are going to give you the correct result which means that one out of 100 is going to give you an incorrect result but as those of you with children may know or those of you with siblings may know when you take a pregnancy test you don't just take one you take multiples why do you do it 99 effective is fairly effective the probability that your test is correct is really really strong so why do people buy two three or four pregnancy tests these guys are expensive why do you buy all of these tests to confirm something that the first test told you was correct other examples of when we use statistics and decision making well there's a medicine that we have to take okay and this is all this is i want you to pretend right now so there's a medication that we have to take and if you've been suffering with a medical condition it's a fairly bad medical condition and the doctor prescribes you the medicine and and on the label is the following warning 30 of people taking this medication has side effects that may make you extremely ill well you have to make a decision will you take the medicine and risk the fact that it could give you a side effect that will make you really sick or do you forgo the medication for something else that may not work as well but has less of a probability of giving you nasty side effects well some of you may say well it depends on why i'm taking the medicine am i taking the medicine because i'm really sick right if this is a matter of life and death if you're taking the medicine then yeah you're going to risk that 30 chance of you getting sick of you getting ill from the side effects 30 chance of bad side effects uh what if you're taking the medicine because uh let's say your stomach hurts well i don't know if i take stomach medication to make my stomach feel better if i know that that there is a possibility that i am going to have some pretty nasty side effects from it so we make these statistical decisions every day in the real world other examples um you may have heard now that we're swapping out old light bulbs for new light bulbs and the new light bulbs have a much longer lifespan much longer average lifespan than the old life bulbs did okay so let's see the new light bulbs we're getting now have an average lifespan of three years the life the light bulbs last three years that's fantastic so does that mean i'm gonna spend an extra two dollars and fifty cents three dollars four dollars five dollars in the extremely long lasting light bulbs and what does it mean that they have an average lifespan of three years does that mean that all of them will have a lifespan of three years what do we mean when we talk about average so these are all questions that we're going to be answering this semester these are statements that you may have heard in the news read about a newspaper seen online or discussed with someone and so the nice thing about statistics is it's used in almost all facets of life that we can think about statistics really is the mathematics of real world data so no longer are you going to ask that question that my students always ask me in my algebra class which is when am i going to use this i'm going to answer your question right now we are going to use this everywhere we use statistics in so many facets of the real world and that's what we're going to talk a lot about this semester is how do we use statistics how do we make decisions regarding statistical information that we have so let's talk about one of the first splits in statistics this semester is going to be divided up into two different parts the first half the semester we're going to talk about descriptive statistics so again the first half the semester we have descriptive statistics the second half the semester we're going to talk about inferential statistics so your midterm will focus mostly on descriptive statistics and your final exam will focus mostly on inferential statistics well what are the differences between statistics between descriptive statistics and inferential statistics descriptive statistics let's start off with defining what we mean when we're talking about descriptive statistics indescriptive statistics indescriptive statistics these are the basic statistics that describe what's going on in a population or data set in descriptive statistics we describe what is going on in a population or data set descriptive statistics are important because they allow us to see patterns among our data and make sense of the data so in descriptive statistics we see patterns in our data and make sense of the data it's important to realize that descriptive statistics can only be used to describe the population or data set that we study so we can't make generalizations about other pieces of data based on this data so in descriptive statistics data is collected data is organized now when we talk about organizing the data we can talk about organizing the data in terms of graphs or pictures we can talk about organizing the data in terms of some numerical values data is collected organized and summarized so i hope you can see the word descriptive there is the word to describe this is where we describe the data so in descriptive statistics we're going to talk about first of all how do we properly collect data we're going to spend the next class or two talking about that then we're going to talk about how do we organize data what are different graphs that we can use to help us organize the data and there's so many different types of grass we're going to talk about a lot of new ones um that that go hand in hand with statistics organizing also in terms of numerical values so that is if you have 10 or 20 000 pieces of data how do you take all of those pieces of data and knock them down into a handful of pieces of data and how do you make sense of it and then we summarize it so then we talk about what do these collections graphs pictures and numerical values have in common why is our data behaving this way so we try to make sense of the data and try to figure out why our data is behaving the way it is now the difference between descriptive statistics and inferential statistics inferential statistics allow us to make decisions or to infer trends about a larger population based on the study of a sample taken from it in inferential statistics we take a sample of data and we use that sample to help us make generalizations or predictions so we use inferential statistics to help us make generalizations or predictions about how variables will relate to a larger population we also use inferential statistics to examine relationships between variables so in inferential statistics we collect data and we use that data to help us to infer trends about the larger population based on the study of a sample taken from it now we're going to talk exactly about what that means in a little bit we use inferential statistics to help us make generalizations or predictions about how variables will relate to a larger population and it helps us to examine relationships between variables so in descriptive statistics we're not saying anything about the general population in which we pull our sample from in descriptive statistics we're only looking at our data and trying to make decisions about our data in inferential statistics we're looking at our data and trying to make decisions and trying to infer trends about the overall population in which our data came from that's going to be what we're going to talk about the second half of the semester so for the beginning of this class we will rely heavily on descriptive statistics we'll talk about how we can organize data visualize data and summarize it and like i said the second half of this class relies more on inferential statistics which will require analyzing statistical data to make informed decisions for now let's consider some basic definitions that we will use when describing statistics when we determine that we want to use statistics to help us to gather information we first become concerned with the data collection data is raw material from which information can be obtained we can break data up into different categories so the first big split of data that we are going to talk about is qualitative data and quantitative data the generally speaking when we measure something and we associate a numerical value with it that is what we call quantitative data okay so in quantitative data it's a measurement that we associate a numerical value with now if we compare quantitative and i hope you can see in the word quantitative we have that word we have that we have that root of the word which is quantity remember quantity is the amount um in quantitative data when we talk about this idea of quantitative data we're dealing with characteristics that can be measured objectively and expressed in terms of numbers so that's what we talk about when we talk about quantitative data now in qualitative data qualitative quality i hope you can see that there's the word quality in there qualitative data when we're dealing with that characteristics and descriptives that can't be measured objectively but can be observed and described so when we talk about qualitative data another way for us to think about it is quality and qualitative data is also known as categorical data so sometimes we'll see the word categorical instead of qualitative okay so our first main split in data here when we're dealing with organizing it is the split between quantitative now quantitative quantity numbers we associate with it or qualitative a quality it's a characteristic or a descriptive that can't be measured but we can observe and describe it now something to be careful of just because a piece of data is given as a numerical value it does not mean that it's quantitative an example of that is let's take a look at this guy hopefully i'm not that old and you guys know who this guy is over here this is one of the best basketball players of all time this is michael jordan now michael jordan is synonymous with the number 23. just because the number 23 exists as a number it doesn't necessarily mean that it's quantitative data it's number but measuring it makes no sense what that means is 23 is actually a characteristic or quality that describes michael jordan so while 23 looks like a number not all numbers represent quantitative data in this case 23 is a number and that number is a characteristic of michael jordan so let's take a look at an example let's take a look at this example below this is a tree we can talk about different characteristics that the tree has using both qualitative and quantitative data so let's compare the trees using qualitative and quantitative data let's start off with um let's start off with qualitative data first so qualitative remember we're talking about when we talk about qualitative data we're talking about different qualities that the tree has so one thing one quality that we can see is this tree is green right it's a non-numerical measurement that describes the quality of it this tree has big leaves this tree is an oak right so all of these are examples of qualities that the tree has that doesn't lead us to associated with the number this tree is healthy right and we can see either tree is healthy or unhealthy this tree is green what else so so let's think about one more possible characteristic that we can use for qualitative data to describe it this tree is small right these are all descriptives or different characteristics that this tree has that we can use to discuss it without using numerical values that need to be measured now let's compare that with quantitative data i urge you now just to pause this and see if you can come up with a list of five different numerical because now we're on to quantitative remember quantity numbers so so qualitative quality or chara or or characteristic or category okay quantitative we're thinking about numerical values that we can measure okay so this tree is let's think about different numbers now we're going to make this up we don't know any of this so i urge you also just to make this up this tree is 7.5 feet tall this tree has 6 342 leaves so we're doing here we're putting numerical values of things that can be measured we can count the amount of leaves we can figure out how tall it is this tree weighs um two tons again i'm making this up i'm just putting some numerical values that we can associate about that this tree has 14 branches this tree is let's think about another number that we could associate it with it uh this tree is let's say 1362 um centimeters wide okay so all of these are numerical values quantitative data numerical values that we could associate with this tree so one more time qualitative quality quantitative numerical values okay you ready pause right here pause right here take a minute or two take a look at each one of these examples and tell me if those examples are quantitative or qualitative so let's make sure that you are comfortable with the first major split of data uh quantitative or qualitative let's take a look at each one of these a person's religion that person's religion is a quality that or a category that they have so this is an example of qualitative data the way to think about it is if you have to measure it it tends to be quantitative and if you don't it can be qualitative so let's take a look at this a person's height well person site is something that is a numerical value that you have to measure so that's quantitative okay how many miles you live away from your parents tip this is something that you would have to measure right into numerical values this is quantitative eye color okay eye color is a quality that you have that's qualitative temperature same thing it's something you have to measure it's associated with a number numerical value so that is quantitative pass fail a test now here's an interesting example and most of the time if we look at these examples we can either call them quantitative or qualitative but sometimes you can make the argument for both let's talk about why we can make the argument for both here when we talk about passing or failing a test first of all just looking at the categories as being binary over here we could call that qualitative right it's equality we're not associating a number with it we're just saying that that this person either passed or failed we're not saying what the numerical cutoff is so just looking at that then we would call it equality or in that case it would be qualitative however if you wanted to associate numbers with it right so if you want to say um that failing is from 0 to 59 passing is from 60 to 100 if that's the case then we could call it quantitative so there are most of the time we have a good understanding as to what the split is but in some of these cases we can easily turn qualitative data into quantitative data or quantitative data into qualitative data look now that we're a little bit more comfortable in terms of determining what the two big data splits are we're going to talk about the next data split we have the ability to break data down into even further bits so for quantitative data we can break it up into two further buckets quantitative now remember the quantitative was the numerical data so we can break quantitative data up further into two different buckets one which we call discrete data and the other which we call continuous data the general idea behind the split for quantitative data is that when you have a quantity of something you can either count it or measure it if it's something that you can count that data is considered to be discrete so discrete data is data that you can count if it's something you have to measure then that data is considered to be continuous so continuous data is data that tends to need to be measured so discrete data describes data values that are quantitative discrete data also has finite or infinite countable values so indiscrete data we can count it on our fingers assuming that we have an infinite amount of fingers okay and we're going to talk in a few minutes about examples of discrete data so discrete data describes quantitative data values which is finite or infinite countable values so it's values that you can count on your fingers if you have an infinite amount of fingers now discontinuous data the difference between discrete and continuous in continuous data it describes values also that are quantitative and it results from infinitely many possible quantitative values where the collection of values is not countable where the collection of values can't be counted because it's on a continuous scale and the idea here when we talk about continuous data sets is continuous data sets are data sets that need to be measured okay up here when we talk about discrete data these are data sets that we can count okay so one we can count and one we can measure now we're going to talk about what this continuous scale means in a second but let's take a look at some examples we consider the following situations of quantitative data let's take a look at a scenario where we are the ceo of a large airline okay so we're the ceo of a large airline we're going to refer to the airline as bmcc express here is some of the data that bmcc let's fix this over here that bmcc express collected about their airline okay so here's some data for some quantitative data that the airlines bmcc express wanted to collect uh first question how many delays did our airline have last year second question on average how many gallons of fuel did our flights use to get from philadelphia to san antonio third question how many flight attendants do we have on our flight from miami to jfk fourth question what was our cruising altitude on our flight to philadelphia from boston fifth question what time did we take off and what time did we land on the flight from miami to jfk six questioned how many passengers had to be bumped last year so all of these questions are quantitative right because they all represent numerical measurements but we can split them further we're going to figure out which one of these guys represents what we call a discrete data set and which one of these guys represents what we called before as a continuous data set now keep in mind that for discrete data sets these are ones that we can count so so if you have 20 30 400 million of something it would take a long time to count but we have the ability to do it if we wanted to for the continuous data sets these are the ones that can take on an infinite amount of values and we tend to need a tool to measure them so let's talk about what this means when we say that we can take on an infinite amount of values what that means is something is placed on a continuous scale for example think about milk right when we buy milk we buy milk in terms of gallons right how many gallons of milk are we buying we buy milk in terms of gallons depending on how much you like it or how many or how much you use right most people get the two gallon containers of milk but the idea is if we were to measure the two gallon containers of milk it wouldn't have exactly two gallons where normally what you do when you're when you're trying to fill up the milk is you try to get as close to two as you can it could have something like 1.99832 gallons in it right if we're going a little bit under it could have something like 2.0857943 gallons in it the idea is this decimal could go on forever and ever that's what makes it continuous that you can have an infinite amount of decimal places after it so so also remember because it's continuous we need something to help us measure it we need help with measuring gallons right we need a scale to help us measure something in terms of gallons we don't have the ability to figure out what gallons are without help without without measuring it so an example of a variable that's on the continuous scale would be gallons of milk and again the reason it's on a continuous scale is because if you're trying to figure out what two gallons of milk looks like it's really um two point and then uh like a string of decimals after it and that string of decimals is what makes it continuous you can go on forever and ever with the decimals after but we end up rounding it to two so let's take a look at those examples that we just went over and let's talk about if they are continuous or discrete now again the question i want you to ask yourself as you're going over these to determine if something is continuous or to determine if something is discrete as we're going over them i want you to ask yourself can i count it or do i need to measure it so i urge you to pause right here read each one of these answer them yourself on your paper and then go over them with me so pause right now okay welcome back let's take a look at each one of these and ask yourself if this variable is continuous or discrete first question how many deletes did our airline have last year well some examples let's say we had 438 delays 438 delays or let's say we had 1234 delays these are all going to be numbers that you can use to count on your fingers right assuming you had 438 fingers or assuming you had 1 234 fingers we don't need a tool to help us measure how many delays that we had so this is an example of something that's discrete there's no decimal or fraction after it so this is something that's discrete we hit integer values on the number line there okay next example on average how many gallons of fuel did our flights use from philadelphia to san antonio well this is something that's continuous we just talked about gallons we need something to help us measure gallons and let's see we did um let's say we used i don't know 2 398 gallons remember it's probably not exactly 2398 maybe it's 2398.015 gallons so the decimal after it the fact that those decimal place values could keep going on and on forever makes it a continuous level of data okay number three how many flight attendants do we have in our flight to miami from jfk again this is something you can count on your fingers right you can count how many flight attendants you had we had 13 or we had eight or we had um nine whatever it is this is something you can count on your fingers we don't need fractions or decimals to help us estimate it so this is discrete what was our cruising altitude on our flight to philadelphia from boston okay cruising altitude high how high up in the sky are we so let's say our cruising altitude is two um 2.5 miles first thing that we should know is that altitude is something that we need help measuring so we need an instrument to help us measure it so for an altitude we need something i think it's called an altitude altimeter [Music] i'm not sure what it's called but there's something that we can use to help us measure altitude i think i made that word up there um and remember when we're talking about this when we're talking about we're cruising at 2.5 miles we're probably not cruising exactly 2.5 miles we're probably cruising at 2.5 0 0 2 3 8 6 miles right so again it's a continuous measurement because it's measured on a continuous scale there's some decimal values that can follow it what time did we take off and so let me put continuous in here sorry what time did we take off and what time did we land on the flight from miami to jfk time right here's an interesting example do we need help measuring time sure we need clocks right and if you say we took off at um at let's say 139 we didn't actually take off at 139 we probably took off at 139.0124 seconds so time is something that's measured on a continuous scale as well the last example how many passengers had to be bumped last year well people are discreet right we can count people using fingers so you can say um 300 321 passengers had to be bumped or 409 passengers had to be bumped or seven passengers had to be bumped this is something that we can count on our fingers assuming we have 321 fingers 409 fingers we know we have seven fingers so this would be an example of the discrete level of measurement okay so we talked today about an introduction um of statistics the main things that i want you to take away from what we talked about today was this idea that when we talk about statistics first of all for the first half of the semester we're really going to focus on descriptive statistics descriptive statistics is going to tell us how we can draw statistics how we can visualize statistics and how we can use a couple of numbers to describe statistics that we're looking at and once we start talking about data the data's broken up into a few different buckets we start off with data data can be broken up into qualitative and quantitative so you should know the main differences between qualitative and quantitative data and we can split quantitative data up even further into um continuous and discrete okay i will see you next class