Transcript for:
Understanding Mean and Median in Statistics

the goal of this section is to find a parameter that summarizes the the position of a data set that somehow kind of tells us where the center of the data is okay this is a little tricky because the center of a data set isn't a well-defined or an obvious thing depending on who you ask you may get a different answer about what the center of the data is okay in fact we've already talked about one way that we could find the center of a set of data that's the median and this is quite literally the center in the sense that if you line up all the data values the data value in the middle is the median right now it's a good measure of the center of a set of data but it has some limitations and in particular for us the way we find the median is a is a procedure it's not a mathematical formula and that makes it a little hard for us to analyze the data mathematically so we want a different measure of the center of a set of data the measure we're going to use is the mean okay the mean like the median is a measure of the center of the data a measure of the center of a data set okay we find the mean by adding up all the values in the data set and dividing what how many values there are find the mean of a data set by by adding all the data values together and dividing by how many there are okay so it's a fairly simple formula we'll Express that more mathematically in a little bit okay uh you may know the mean by another name this is frequently called the average by most people technically the average means like the usual value it doesn't necessarily mean the mean of a data set but most the time when we're talking about average that's what we mean is the mean of a data set okay so usually when we talk about average that's what we're talking about so the mean of the median are both measures of the the center of the data you could also think about it as kind of the the usual or common value of the data it's kind of the value in the middle okay note though that these are not the same not the same okay sometimes when you calculate the median the mean they might come out the same sometimes but generally they will be two different values they are two different ways we can measure the center of a set of data one's not better than the other they're both useful and sometimes the median might seem a little better or sometimes the mean might see a little seem a little better depending on what we're trying to do okay but there's not one that's absolutely for sure better they both have their uses the real advantage to the mean here and the reason why we will tend to use it more in this class is again we can represent it and find it using a mathematical formula we cannot easily describe the median using a mathematical formula we could describe a procedure some steps that get us there but it's not a mathematical formula okay that we could express using addition subtraction multiplication division okay so because of that we're a little more equipped to talk about the mean more easily okay so let's start off with a little example to see what this looks like in practice okay so let me get some data here one one one two two three four four four let's find the mean of the data okay well how do we do this we know from our our basic principle here our basic idea that we're going to add up all the numbers together and then divide by how many there are okay so we're just going to take all these numbers and we're going to add them together so one plus one plus one plus two plus two plus three plus four plus four plus four plus four plus four okay and then we divide by how many there are how many are there there's one two three four five six seven eight nine ten eleven data values okay so that is the formula that we would use to find the mean for this set of data okay um do notice that we could make this a little bit easier to work with and this is one of the things that we should know about the mean okay we have a lot of repeated addition here um whenever we have repeated addition like this where we add the same number to itself over and over again we can summarize that using multiplication right one plus one plus one I'm adding one to itself three times so that is three times one okay I'm adding 2 to itself two times so that's two times two I added two two times I only have one three so I only have one three but I have five fourths so when we have repeated values like this we can use multiplication to kind of squeeze things and condense the formula a little bit make things a little bit easier okay let's see what this looks like in the r calculator okay so let's take a look at that first Formula I have one plus one plus one plus two plus two plus three plus four plus four plus four plus four plus four right I got them all in there um and then I want to divide by how many there are so I'm going to divide by 11. if you click evaluate it's going to run and it'll look like everything's fine but you'll get the wrong answer you have to remember that the calculator here runs everything according to the order of operations and in the Order of Operations Division comes before addition so the computer is going to look at that and say you want me to take 4 and divide 11 first then once I'm done with that add all these numbers to the results okay but that's not what we want if we look back here what we want is we want to add all the numbers first and then divide everything by 11. what this fraction bar does is it groups everything up here together just like parentheses do and so we want to do all that first okay so when I do that with a computer you have to be especially careful with division because of this we want all of this to be in the numerator of the fraction we want all this to happen first so I'm going to wrap that in parentheses that tells the computer do this first add up all these numbers first then divide all of that by 11. if I evaluate that I get my average 2.727273 okay I could also use this formula instead okay what would that look like in the calculator that'd be 3 times 1 plus 2 times 2 plus 1 times 3 plus 5 times 4 divided by 11 just like before I want all of this divided by 11. it's all in the numerator so I wrap it in parentheses if I click evaluate and this is the last row so that will be what is what's shown when I click evaluate I get exactly the same thing so it really does calculate to the same thing notice that here in the numerator when I'm multiplying things I just use parentheses and we know when we write things out on paper that if we just have number next to a set of parentheses that means multiplication okay so three parentheses one means three times one but that's not what it means here if you do three parentheses one like that and click evaluate you'll think you'll think you'll think and then it gives you this big long error that says Whoa something's wrong here attempt to apply non-function that's uh confusing computer wording for hey I don't know what you're talking about with this parentheses those shouldn't be here okay parentheses have a special role in the computer language so whenever we do multiplication we have to use that asterisk symbol okay that's what we need to do for multiplication so no parentheses I could do the asterisk with the parentheses that'd be fine okay but I can't have just the parentheses so be careful when inputting this into the computer okay okay so we have the average the mean that is one way to measure the center value and it's going to be the most common one that we use throughout the rest of the course notice um it's not the only way again we can calculate the median by finding the center value so if we were to find the median we just count to the center so one on both sides two on both sides three on both sides four on both sides five on both sides three is the one in the center okay there are five numbers smaller than it there are five numbers bigger than it three is the one in the center so the median is three okay notice that the the two Center values three and two point seven two seven two seven three they're kind of close but they're not the same the median and the mean are not the same okay um now the means really important so we give it some special symbols okay in fact we give it different symbols depending on whether we're talking about a population or whether we are talking about a sample so let's first talk about a population when we're talking about a population we use the Greek letter mu the the population mean okay actually let me do it this way population mean Greek letter mu looks like a u with a with a tail in the front long tail in the front this is the Greek letter mu okay when we're talking about the mean of a smaller sample we use a different symbol it's going to be important that we distinguish when we're talking about the mean of a sample and the mean of a population we're talking about the mean of a sample we use the symbol X bar just an X with a line over it we call it X bar okay this is going to be important because a lot of times we are really interested in the mean of a population mu okay we're really interested in this but it's hard to find okay we don't a lot of times it's hard to find um information about an entire population it's kind of the a lot of the point of Statistics is that this is difficult to find so we approximate it with the mean of a sample okay and so a lot of times we can find the mean of the sample and it will be close to the mean of a population but usually they won't be exactly the same okay so it's going to be important that we distinguish between the two that's why we use different symbols so for the remainder of this class whenever you see mu that's talking about the mean of a population whenever you see X bar that's talking about the mean of a sample okay we also can write the uh how to calculate the mean of a population or the mean of a sample in terms of mathematics Okay so specifically mu is and what do we do we add up all the data values and then divide by how many there are okay so how do I write that mathematically big funny looking e thing and that's called uh it's the Greek letter Sigma it's a capital sigma and in math that means I'm going to add a bunch of stuff together so I'm going to add a lot of x's together here x is standing for the data values okay when I talk about X I'm talking about data value so I'm adding all I'm adding all the data values together and then I divide by how many there are I use n Big N to notate the um the size of a population so here X is the data values in the population and big n is the population size how many data values there are okay scroll over a little bit give myself some room let's talk about what that would look like for a sample if we're talk if we're calculating the mean of a sample it's essentially the same formula the same idea but we use different symbols because we want to really distinguish when we're talking about a sample and when we're talking about a population so again we're using that X bar for the sample mean we do want to add up all the data values and we still use X for the data values when we're talking about a sample but to distinguish between the size of a sample and the size of a population I divide by little n here little n represents the size of the sample okay so here x is still the data values but now it's the sample data values and little n is the sample size how many data values are in our sample so depending on whether we're talking about a sample or a population we will use one set of symbols or another in either case it's the the way we calculate it is essentially the same the symbols we use in the way we think about it are a little different okay in real life in practice most of the time we're trying to find out about the population mean but it's hard to do because the population's so big or maybe there's something preventing us from surveying the entire population and getting information from the entire population so most of the time we are trying to approximate the population mean using the sample mean so in real life we almost always use this formula not this formula okay we almost always are calculating the sample mean to help us learn about the population mean okay so let's see what this formula looks like when we apply it to the data set we were already working with and especially look at how we use this formula in our calculator okay so let's clear this out what I'm first going to do is I'm going to create a list of the data values we have so those values were one one one two two three four four four four four okay and I've just created a list of the data values notice I used the C at the beginning because I'm combining the data into a list and I stuck them in between parentheses okay I need to give this list a name I'm going to call it X I like to call it X because it the resulting formula that we get in the calculator will look a lot like the formula that we have on paper you don't have to call it X you call it whatever you want but if you call it X and I usually will it will just help things look more like they do on paper okay and then we need to use we need to find how many data values we have now we could just count we know there's 11 right or we could let the computer count for us if I use the length function stick X in between the parentheses that's telling the computer count for me how long this list is how many elements are in this list okay and then store that value and variable n call whatever value that is n and that way I know X has my list of data and N is however long that list is okay now with that done what do we want to do we want to add the data up and divide by how many there are okay so how do we add up well we use the sum function this function tells the computer we want to add everything in this list up together okay this weird looking e this Greek letter Sigma think about it as the sum function think about it as being the word sum we want to add these guys together we want the sum of these values okay so we're going to take the sum of the values in this list and we divide by how many there are we divide by n notice how much this looks like the formula on paper the sum function plays the same role as that Greek letter Sigma then we have the X so we're adding up all those X's we're adding up all the x's and then we divide by n we're dividing by n so by setting this upright we could get this to really look like um how this formula looks on paper okay notice we get the same average but now the calculator is doing all the work right all we had to do is provide it the list of values and then it counted how many there were it added them up it did the calculation all we had to do was give it the list of values and it's doing all the hard work okay now later on in the course it's going to be really important for us to use the mean that we calculated to do other calculations so we're going to give this a name now in this case we're imagining we're calculating a sample mean so we would represent that with X bar now there's no X Bar Key on your keyboard right there's an X key but we've already used X up here we don't want that so what I tend to do is I just spell it out in English okay so X bar is how we pronounce what that symbol look is okay so that's what I type out and now every time I type out X bar after this the computer will know ah you mean the mean of that set of data that we already calculated okay