hi everyone this is Matt to show with intro stats and today we're looking at one of our first topics and intro stats last time we just kind of introduced a little bit of what we mean by intro stats what the class sort of some key topics in the class but one of the key things that we talked about what it was about analyzing data right data so we said last time the data is thought of as information in all forms right and today we're going to learn that there's two main types of data and we're gonna kind of learn those two main types of data now this is actually super important a lot of the techniques that we used in statistics are tied to what type of data you have okay so it's really important to know what are the types of data that you can have we don't analyze categorical data the way we do quantitative data relationships between two categorical data sets are different than a relationship between two quantitative data sets so knowing sort of what type of data you're dealing with is super super important and it kind of lays the foundation for mobile for a lot of the other stuff we're gonna learn all right so let's get right to it the two types of data so the first type we is called categorical data categorical data or some people refer to it as qualitative data I usually say categorical data but you will see stat books that refer to it as qualitative now categorical data is labels or words that describe people or objects in the data so you'll see how a lot of our data sets that we use in intro stats are usually found in and summarized in Excel spreadsheets or in any kind of spreadsheet like pages or is some kind of spreadsheet program but think of it like okay if I have this Excel spreadsheet and I look at the data itself the column almost looks like it's made up of words usually so it's usually made up of words so think of it as the column of data you're looking at is all made up of words okay so let's look at some examples so if I asked people in the US what state are you from and somebody says oh I'm from California and Oprah says oh I'm from Arizona and I'm from Colorado and this one says I'm from New Mexico and another person said they're from Colorado right now if you notice the the datasets really made up of words or descriptions something that you're describing the person something about that but notice that the datasets mainly made up of words or in this case abbreviations of words a very common categorical question is something that involves yes or no like do you smoke cigarettes yes or no do you have at least one tattoo yes or no all right so first we might ask them do you have at least one tattoo and the person this person said yes and then this another person said no and this other person said no and and so on I may have had another person that said yes and then another person that says no okay so notice the data set again is made up of words that's called categorical data okay categorical data usually think of it as data set that's made up of words I could ask people what month are you born in right were you born in March this person was born in March this person was born in June another person was born in October January August June and so on okay so notice again the data set the column of data looks like a bunch of words and that's how you know you're dealing with categorical data so again as we go through these techniques of how to analyze data one of the first things I'm gonna always refer to is what type of data are we dealing with okay so a data set made up of words is categorical data it's a description okay now there is an important note sometimes you'll see people use numbers in place of a word they're kind of get they don't really want to write this words all the time so they just refer to it as a number so for example I in yes/no questions sometimes you'll refer you'll see people say like they might refer to yes and the put instead of writing the word yes they'll write a 1 and maybe the know is a 2 or sometimes you'll see 0 like the read 0 for knows and ones for yeses kind of like that now those are numbers in those are numbers but they're not really measuring something they're in place of a word that the number one means yes and the number zero means no so somebody's using numbers in place of a word that's still considered categorical data okay these numbers really aren't measuring something they're they're they're basically just used as a placeholder for a word same thing this and instead of saying what month were you born in I could say so someone said they were born in March they might you might see someone just write three there this March is the third month of the year right in June they might write a six in October they write a 10 in January they write a 1 in August they write an 8 and June again so we get 6 and so what you see is it might it might make you think that this is not categorical data because it's numbers but it's actually numbers again in place of a word so even though that even though these would be numbers we would still treat this and analyze this like it was categorical data so you have to be really careful of that sometimes you'll get numbers that are used in place of words and we still have to analyze it like we would analyze categorical data so something a good important examples to think about is a zip code for example zip code is a number but it's really it's sort of a number in place of me describing where you live right and in the sort of the area you live we have ZIP codes in the US for for kind of an and number that tells us in at the general area where you live so a zip code even though it's a number it's not really met during something that's an important distinction it's not counting or measuring something it's just sort of telling us where you live it's almost like a number in place of a written description of where you live the other ones that can be very tricky are what we call identification numbers so identification numbers like driver's license number or social security number or student ID number your health insurance ID number now these are numbers but again they're really just used to identify you they're not really used to measure or count something they're not measurement data okay that's it that's an important distinction so not all not all numerical data is actually quantitative some of the numerical data you see might be categorical so you kind of want to have that distinction a lot of people will just kind of take identic identification numbers and put them in their own group because they're really like a group of 1 right it's just that it's just identifying you it's not a lot of times with these kinds of categories that I want to group people and see well how you know how many people were born in March and group all the people that were born in October and so on so these are kind of a special case but I still would not treat them as a quantitative data set I would not think of these as numbers that are measuring something or counting something these are numbers that are used almost as a category it was like a category of one so these kind of all still fall in the categorical data genre now if you're not categorical data the other one is quantitative data quantitative data okay this is numerical measurement data so it's it's numbers that are actually measuring or counting something a good thing to think about too if you're often mister is this categorical or is it quantitative does the numbers have units units are almost a good sign that you have quantitative data also would I be interested in to thinking about taking an average what an average makes sense let me ask you a question what I take the average of zip codes if I took the average of zip codes that would just be weird right it wouldn't tell me anything it wouldn't sir wouldn't give me what the where the average person lives right so zip codes are not quantitative I wouldn't take an average of them right so a lot of times in a quantitative data set you're almost always looking at some kind of type of average also the envy just counts or frequencies from various places or groups and things like that so if you think of counts or frequencies does it have units averages some kind of numbers that are measuring something so some prime examples if I ask people their height or I measured them with a tape measure maybe their height in centimeters or I might do a height in inches in the US we do a lot of inches so I ask people their height in inches that one person was 61 inches and one person was 64 point five inches and the next person was 72 inches and 68 inches and so on now I notice this data is numbers right and it's measuring something that's measuring the heights of people it also has units inches and I'd really be interested in what the average height that these people would be right I could figure out the average height so that also tells me quantitative this data set that's made up of numbers would be considered quantitative I can also look at things like here's a good example of counts a lot of times quantitative data are counts so we might have that I might look at the number of cars in various parking lots so if I look at cars and various parking lots maybe one parking lot had 23 cars in it and another parking lot have 11 cars another parking lot had 47 cars another parking lot had 138 cars another parking lot hardly anybody was parked there was only nine cars another parking lot was humongous have 217 cars in that parking lot and so on so we can see as here this is a cab ride from each parking lot but the data set is of numbers and the numbers really are measuring something okay it's not like these numbers where the number was just not was just describing what month they were born in this is a number that's actually counting or measuring something so it's got to be quantitative I also would kind of think about what would be the average number of cars in all these parking lots right an average sort of makes sense so those are some things to ask yourself when you're when you're kind on the fence about whether data is quantitative or not okay so two types of data categorical data and quantitative data make sure you can tell the difference that's really important all right so this is Mattie show and intro stats and I will see you next time