Transcript for:
Measures of Central Tendency

[Music] measures of central tendency we have visited this table before in the previous session and as we move into the topic of measures of central tendency let's revisit this table as we use this as a guiding post in learning about the descriptive statistics so we talked about different types of variables as they are listed in the first column nominal ordinal interval and ratio and we talked about individual scores what do we do to organize them we hang the numbers similar numbers together and that's what a frequency distribution and we talked about calculating for a percentile within that topic and now that we know how numbers hang together what the frequencies look like we are interested in knowing what the average is or midpoint is and so that takes us to this topic of measures of central tendency under the topic of measures of central tendency we're looking at three different types of averages mode median and mean and depending on the type of variables we have we provide different ways of measuring the central tendency so measures of central tendency is a fancy word for average and average is defined by any of these three different ways that we measure and then in the next sessions we will talk about how scores vary from one another and how variables correlate with one another but first let's talk about the three averages so we talked about variables as a measurable factor that has an effect on a phenomenon or phenomena so anything that's measurable that has categories and these variables fall into one of two types of variables and that is categorical or a continuous categorical variables are variables that have categories that use words and continuous or kin or numerical data measure variables are using numbers so more specifically then under categorical data we're talking about nominal or ordinal types of data and ordinal takes on an added feature of order or rank built into the categories then we have the continuous type that are most more specifically interval or ratio so let's really go through these variables then to kind of recap what types of variables we're talking about and again understanding the various various types of variables is very important as you saw in the previous slide with the table that once we know the type then we know what kind of the central tendency to report let's look at these variables scores zero to 100 what type of variable would this be nominal ordinal interval or ratio but before we asked that question let's see if we can decide if a variable is categorical or continuous and then we can go into the more specifics so with scores 0 to 100 the categorical data or continuous data it would be continuous because it's numerical then more specifically continuous data is either interval or ratio which one would that be the difference between interval and ratio is the non arbitrary zero characteristic right ratio variable has a non arbitrary zero that means the same thing to everybody so if these scores range from zero to 100 and zero meant 0% or zero correct that means the same thing to everybody so we would say then it's continuous data and it is ratio how about favorite colors with this variable be categorical or continuous categorical okay and more specifically nominal or ordinal nominal right because we're looking at different categories that are colors so these categories of colors have no rank or order so it's a very easy or least complex type of data which is nominal data how about age is that categorical or continuous continuous because age range from 0 to 100 or 120 or how far out you want to take that and more specifically is that interval data or ratio data it would be ratio data because age begins at 0 and 0 means the same thing to everybody the point of beginning of a life age would be ratio with the age beginning with 0 to 100 or 120 or how far out you want to take it with the starting point of 0 and gender would be nominal because it's categorical and nominal because it has just categories how about income categorical or continuous its numbers right so it's going to be continuous and more specifically interval or ratio it would be ratio because if you say you make a certain amount of money per hour that that carries its meaning based on a zero dollar scale but you might also argue that if you are talking about a population of those that are working and there's no one that's receiving zero a zero dollar then you might argue that it's interval data in its unaltered general form it would be ratio how about grades let's think about grades as a through F the letter grade that a student gets in a classroom so that would be categorical instead of continuous and would that be nominal or ordinal a through F has some kind of rank built into it so we would say that it's categorical and ordinal data how about types of cars categorical or continuous categorical more specifically nominal or ordinal types of cars is just types of cars there's no order built into it unless you added the rating or the cost right so it's just nominal data how about height categorical or continuous height is numerical right so it's continuous data and more specifically interval or ratio it would be ratio because there is a beginning point on the scale that we use to measure our height starting with zero and similarly for weight this is continuous data and ratio because it's also a pre-existing scale that we use to measure ourselves how about pain you remember the pain scale that we would typically find in a hospital room say from 0 to 10 as a scale in rating pain score and so with that the categorical or continuous it is numerical if we were to think 0 to 10 and so it'd be numerical and is that continua is it interval or ratio it would be interval because on that pain scale 0 means different things to different people okay very good so this helps us to revisit and review the different types of variables so let's go back to the chunk here of the table that we saw in the very beginning so I kind of shaded off the individual scores column so we can just look at the types of variables and measures of central tendency together to see which one goes with what for nominal type it says to use mode for ordinal type it says to use or report medians and then for skewed or ratio or interval ratio which is skewed data or Naaman normal or continuous data together it says to use median for skewed data and mean for normal data so let's define what these three central tendencies our mode is defined as the most occurring category so it's the most popular answer if you had a list of answers here's a quick example if you have the scores 2 3 7 7 8 what would be the mode the one that occurs the most that would be 7 because 7 occurs twice when we think about average we don't only think about the mean right and how we calculate for the mean is to add up all the scores and divide it by the number of scores so if we were to look at the numbers 5 6 7 8 and 9 to find the mean we would add all the numbers together and divided by the total number of scores which in this case would be 5 so add it together all these numbers divided by 5 would give you the mean of 7 median is typically known as the middle number so if you see a list of numbers in sequence like this you would go right to the middle number and that would become your median in the case that you have even number of scores where you have to go let's say there's another number here and you have to go to the middle which would be between these two numbers then you would just take the mean of these two middle numbers so normal or skewed let's talk about that a normal distribution looks something like this where you have a graph that looks like a bell and so a normal distribution is also known as a bell curve where if you were to actually fold these lines to match one side would be symmetrical or identical to the other side so a normal distribution has a symmetrical distribution with no skewed with the hump right in the middle of the graph what the tails that look identical to each other so any shift from this normal distribution is what we call a skewed distribution here are some examples of a skewed distribution here on the left side positive skin tissue has a graph that has a hump that shifted to the left side leaving a long tail to the right side or the positive side so you would see the hum that shifted from the middle unlike the normal distribution and the tails look different there's no tail on the left side but a long tail to the right or the positive side thus it we call this a positive skewed distribution we want to focus on the tail and since the tail is on the positive side we call that a positive skew skewed distribution so this have been extreme case of a skewed distribution and you have one that looks opposite to this one on the right called a negative skewed distribution when you see the hump shifted to the right side leaving a very short tail on that side but having a long tail or leaving a long tail to the left or the negative side and we call this a negative skewed distribution in extremist acute distributions like this if we wanted to see where mode median and mean fall it would look something like this the mode is where the hunt is because if you imagine a vertical line of frequencies that where the line is at its peak would be your mode right mode is defined as the most popular answer or the most common data okay that's where you find the mode but here's an interesting place for the mean now you recall me the mean is to add up all the scores and divide it by the number of scores so what happens when you have a long line and a positive skewed distribution where you have some really high numbers that create this long tail since we have to calculate by adding all the numbers and dividing by the number of scores for the mean we have to calculate all those large numbers into the calculation what happens then for the mean is that it just becomes much more affected by the large scores and it shifts further out to the larger numbers creating an average that may not really represent the entire data and then you have the median now the definition of median is the middle number and it's easy to remember on a graph like that the median falls in the middle between the mode and the mean so what happens in them with the median and the reason why it falls right here and much lower than the mean on this horizontal line is because in a median you're looking for the middle number in a sequence of numbers so from one end to the other end you're going to go up all the way up just until you reach half point so it doesn't matter how high these high numbers are it's not affected by your median so in a positive skewed distribution you see that the median is a better average because it represents where vast majority of the data is so that's a better median than the mean in a negative skews distribution and here's an expression before we get to the negative distribution to Express then the placement of these three averages we would say on this horizontal line the mean is the higher number than the median and the median is you know as a higher number than the mode so the way to know the data and whether it's skewed positively or not is to look at the mean and the median and if the mean is greater than the median then we would say the data is positively skewed in a negative skewed distribution of course the mode is where the hump is similar to that of the positive skewed distribution and the mean is skewed down because of the low numbers here and again therefore not representing the entire data and the median only calculating from the lowest point halfway up to find the middle number and the median then is a better average in a negative skew distribution so in terms of the placement of numbers we would say in this horizontal line the median is a smaller number because it's further closer to zero than the median and median is a smaller number than the mode so anytime you see a mean that is smaller than the median then we would say that's skewed in the negative direction opposite to what we talked about here in the case where the mean median and mode are identical we would say it's a normal distribution in this case we would report the mean so let's add another layer this drop-down kind of a diagram that we started out thinking about and the first is to look at the variable we know that a variable falls into either categorical or numerical and categorical more specifically can be either nominal and ordinal so when you have a nominal and ordinal data and ordinal data measured in words then we just simply report the mode on the other hand when you look at the numerical data that use numbers and that can be more specifically interval or ratio rather than trying to figure out whether it's specifically interval or ratio the critical question is to ask at the distribution what is the distribution and if the distribution is skewed then we would report the median but if either interval and ratio data are normally distributed it's a normal distribution then we would report the mean [Music]