well hello students welcome to chapter four lecture so this chapter is on central tendency measures of central tendency such as mean median mode i will say and we i will also discuss how the those measures can be used in research toward the end of the chapter but i will say for the most part the content in this chapter is the content you are probably most familiar with of all the chapters in the textbook this is most likely to be review let's go ahead and walk through so what are we going to talk about today in this lecture again measures essential tendency and a little bit about how those those uh measures can be used that is um how they can be used statistically or in research so just to touch on that because it's really it really applies to later chapters but we'll we'll talk about that a bit today we'll begin the discussion if you will so notation we have here the sum of or sigma and so when sigma is used like here the sum of x means to sum or add all the x scores sum of x is how its statement goes so that's the additional the new notation let's talk about uh just in general what central tendency is we'll start off with that so central tendency is a measure of the midpoint or the center of a distribution of scores and basically if you have a distribute a distribution it tells you of of scores it'll tell you about where your scores lie on that distribution because it's a measure of the where they center where the scores center first the mode the mode is a measure of what it indicates is the score that occurs most frequently so it's typically used when a squares when you're using like a nominal scale of measurement grouping variable if you will now these types of scales of measurement you've been introduced to in previous chapters so if you're confused as to what nominal scale measurement refers to go back to the previous lectures here's an example though of nominal if we're asked to judge from five options what is our favorite fast food restaurant uh in this particular example five people said in and out ten people said burger king two wendy's three mcdonald's and four carl's jr so um let's look at how the mode so it's used when you have when you have scores like that or i'm sorry a variable like that like a nominal variable so if you have a unimodal distribution what you have is really one huh that's when there is a single score that occurs most often now you may recall from the last chapter the idea of bimodal distribution so this is when you have really two scores that are tied for most frequent and uh so you can also have multi-modal by the way that's when you have uh two or more modes scores that occur equally frequently most frequently so you can see with this distribution here the score five and score nine if you go over here to frequency you can see that there were three folks who scored five three and um three folks who scored nine so they actually tie for most frequent so here you have a bimodal distribution next median median is the the middle score it's the score do you recall percentiles last chapter it's a score at the 50th percentile so half squares above half the scores are below and the median is sometimes used to summarize ordinal data but also highly skewed interval or ratio scores so unlike the mode where you would which is used with nominal uh data the median for ordinal or highly skewed so let's first of all talk about how you calculate the median if that are normally distributed if you have that perfect uh normal distribution as we saw in the last chapter the median and the mode would be the same but when they're not normally distributed so in that case it's easy enough to to identify the median correct but when they're not normally distributed you have to follow a procedure you arrange the scores from lowest to highest because again you have to identify the score in the middle now that's easy if there are odd number of scores because the median would be the score in that middle position but if there are even number of scores you actually have to take the average of the two middle scores in the distribution let me give you an example so consider this set of data you need to find the mode and the median well the easiest way to approach when you have a small number of data points as you do here it's a little more complicated if you have a large number but with a small number easy enough order them sequence from lowest to highest and you can see here let's look at the mode first well it's very easy to see that there are three scores of six and three scores of eight so there you have y model two modes six and eight that would be the correct answer to the question about the mode now how many scores do you have here well one two three four five six seven eight nine ten ten scores so if you had an odd number you could just of course you could just look at that identify that the middle score as the median but since we have an even number we actually have to take the average of the middle two scores so your median would be six plus seven divided by two so it would be six point five so that's how you calculate the mode and the median now let's discuss move to the score that is used most frequently and that is the the measure of central tendency rather that is used most frequently and that is the mean so the mean i'm i'm guessing most of you are familiar with the mean it's the mathematical center of the distribution so it's calculated based on well we'll talk about that in just a second but it's used to summarize interval and ratio data interval and ratio data but here's a caveat when the distribution is symmetrical when it's when it's skewed we'll see in a minute you're more likely to the median may be more appropriate or some other something some other measure of central tendency so if we don't know here's the thing with the mean if you we don't know anything else about the scores our best predictor about our best predictor regarding how an individual will score is um is the mean so let's say we know that um you go to take tests to math test and you've heard that the mean score on the math test is 70. what's the best predictor of how well you will score the mean 70 that's the best predictor no nothing else nothing else the mean is is our best predictor all right so the the formula for the mean is here's use the here's how it's typically represented sometimes you'll see an uppercase m typically represented as such and it's the sum of all the scores over the number of scores over n so if you have a flipping around if you have a perfect normal distribution all three of the measures of central tendency that we've discussed so far are um going to be at the same score we've already said that with a normal distribution perfect normal distribution the mode and the median we said this a few slides ago will be the same but now let's add mean to that that will also be will be located at the same score what if you have a skewed distribution so we saw these skewed distributions in the last chapter and you can see that the the three measures of central tendency look a little bit different in terms of where they fall if the distribution is skewed so with a positive skew the mean is actually greater than the median and the mode and with the negative skew the reverse is true why because these high scores right here drive up that mathematical calculation of the mean okay so it doesn't doesn't represent the middle most value or the typical value if you will not necessarily not when you have a skewed distribution here's an example length of hospital stay we saw this in the last chapter so if you have a skewed distribution where you have some people staying for many more days than most the average that's going to be in the calculation of your average it's going to result in a mean that's going to be greater than the mode or the median we often see this with income too with income where you have a a distribution that looks something like this very skewed then the mean will be greater than what the than what uh typically than your typical income okay so it's uh in those cases the median is often used as your measure of central tendency now here's an important concept that you will see throughout the textbook really for most statistical measures we will talk about deviations around the mean the mean is often used in the statistics that you're going to be exposed to and in many of them um the mean is used it's a measure of central tendency not always but more typically than the others and what is also examined is are the deviations around the mean this is an important piece of information let's take a look so what do we mean by a deviation a score's deviation it's really the distance between a particular score and the mean for that distribution is represented as the score minus the mean so when we are using the mean to predict scores the deviation around the mean indicates our error in prediction so for instance on the exam that i just gave you as an example let's say the mean was uh let's say 75 this time so it's an average score of 75. i'm on a particular exam so we use that to predict your score and you actually score a 90. so the difference between 75 and 90 is actually our error in using that mean as a prediction as to how well you're going to do on that exam that's quite a deviation so let's give it a look at an example here so let's say here's another example let's say we have we give a test to seven people and here are their scores right so if we want to get an understanding of the extent to which these scores vary or the prediction the predictive value of the mean and our error and using the mean as a prediction for how someone would score we might look at the deviations around the mean so let's take for instance the score of five well let me go step back for a second first of all we need to calculate the mean so for these scores we would simply add up all the values it's 39 divided by seven that's n the number of cases in this case and our average then is 5.5714 so if we are going to look at examine any scores deviation from the mean we will take that score minus the mean so here are our means and here are deviations so uh 5 minus 5.5714 negative five seven here's the difference between the score and the mean let's look at the next one nine minus five point five seven one four here's the deviation three point four three and so on so you can do that for all seven scores and so these are all deviations from the mean represented by each of these of these scores now what's interesting about these deviations is if you sum the deviations the sum will be zero because half the scores are above the mean half the scores are below the mean so the sum is going to be zero so going back to a slide the previous slide i skipped over it the sum of the deviations around the mean is always going to be equal to zero and here are the symbols we would use to represent the sum of the deviations around the mean so you have the deviations around the mean x minus the mean and then you sum all those deviations and that sum will equal zero just as we saw right here now this will become so this is actually our error in using the mean as a prediction right so now what you're going to see with many statistics is that we're very interested and when we become interested in variability which is an important attribute of any statistic we're going to be interested in these deviations around the mean so you'll see more of that in the next chapter so now let's talk about we do with this information these these means now you may recall from a very a chapter early on that there is a difference between a population and a sample so when we do research we're interested in a population say all u.s adults and uh typically for for research though we were unable to in our experiment we're unable to uh or survey or unable to test every member of the population so we pull out a sample and we test that sample and the idea is that ideally we take the results in our experiment from the sample and we make an inference about the population based on that sample all right so that's the general overview the symbol for the population mean is mu so this is the symbol for the population mean now if we had all if we had all the population values that they're all known to us if we were able to let's say test on whatever variable it is of interest every member of the us population and that was our population of interest and we could calculate the population mean we may have a smaller population and and in some cases we may be able to populate to calculate the population mean but in most cases it's so large that we actually have to make an inference about the population mean from the sample but again more on that in future chapters if we were to calculate the population mean this would be the formula it would be equal to the sum of all the x's divided by n and as i said sometimes the population sometimes we are able to do this but oftentimes in research we are not able to do that so what do we typically end up doing so let's talk about a typical experiment just to give you an illustration of how a mean might be used so let's say we are interested in memory and how accurate folks are in remembering words on the list and we believe that the accuracy is dependent on the length of the list that is our hypothesis might be that as the length of the list increases there are more errors in memory for the for the items on the list so in this case our independent variable would be the length of the list and show you what i mean by that in just a second and the dependent variable would be the number of errors how might that look like this let's say our for our independent variable we used a length of list we had three different groups three different lengths of lists a list with five items a list with 10 items in the list 15 items so we have three levels of its independent variable length of list and then we measure number of errors so in this case we're using for each group let's say we have 100 people per group so 100 folks from the population we're interested in us adults and how the length of a list affects memory we pull out a sample of uh we give a hundred people a a a list with five words another hundred people a list with ten words and another hundred people a list of fifteen words we test their memory for the words on the list and what do we use as our our measure of performance mean mean we call errors okay so we're using errors in this case so um what do we find here folks who have a shorter list they produce fewer errors uh the mid the mid size list the 10 items a few more errors on average and the longer list more errors that's very intuitive that's probably what we would expect um but the point here is that mean that the average is used as a measure of to describe the performance in each of these groups now if we used if we were interested as for our independent variable in something like um your college major that would be a nominal variable and in that case if we were to we could do that and if we were to plot out our results we would use a bar graph when the independent variable is nominal or ordinal so in this case instead of a length of list we they're all all three groups there are all three majors we have 50 people 50 physics majors 50 psych majors and 50 english majors and we give them a list of words and we're not varying length of list here we're we're looking at college major to see if it affects uh mean recall we give a list of words and we find that the physics majors had about eight errors on average psychology four and english twelve errors on average so if your variable is nominal or ordinal you're going to graph your results your mean your mean in this case using a bar graph now what happens what what happens next and this is what i explained a few minutes ago um so we compute a sample mean from each of our levels or conditions uh and that describes our relationship in the sample among the groups in the sample and then we perform an inferential procedure like maybe a t-test or analysis of variance not expected to know what that is at this point those are future chapters but we perform some kind of statistical test that results in our ability to make an inference about the population based on what we saw in the sample so that's how sample means that's how the the concept of of amig is used for statistical analysis all right so that is about it for the lecture there are some additional slides here but these are examples that i would prefer to discuss during our class time all right so that's it i'll see you at the next lecture bye