Transcript for:
Understanding Measures of Position in Statistics

So we've looked at our measures of central tendency and measures of variability. Let's finish off looking at descriptive statistics with our measures of position. So measure of position it's basically it'll give us an indication of where that data point sits in our distribution or in in the data set and how it compares to other values from that data set or from um maybe even another data set one of the most common um ones and we've even been talking about one um previously is where we talk about quartiles so quartiles divide your ordered data into four parts okay so you'll have um your data we talked about the media and that cuts the data into the top uh bottom 50 percent and then top 50% and it sits in the middle. But then we've got our quartiles which chop the data into the four parts, so 25% in each part. And you can actually see that your second quartile, Q2, is the, it corresponds to your median. So they're interchangeable names for the exact same thing. Okay, so the first quartile, it's also known as the 25th percentile. What you'll have is the, 25 will be below and 75% of the data above. Q2 your median 50 50 and then your third quarter has 75% below 25% above. So they are just single values in in your data set that correspond to each of those positions. Now when it comes to finding them just like we did with our median we want to sort the data in ascending order would be would make the most sense so we sort the data into ascending order we say is the data even if it is then the median the second quarter that's the first one we find it it will be the average of those two middle values okay if it's not then this the median is that middle value okay so if the data is odd an odd number the sample size it'll be that middle value we covered this in finding the median the next step is so once we've found that we then exclude the median so particularly in this case where the median is an actual data point in our data set we then sort of exclude that, stop thinking about it as being part of the data set. And then we find the first and the third quartiles. The first quartile will be the median of the lower half of the data. And then the third quartile will be the median of the upper half of the data. So the first quartile is a median in itself, but it's not the median. Okay. And in finding this, we sort of ask ourselves the same question. Is with that lower half of the data, is that the number of data points in that? Is it even? If it is, then the first quartile will be... the average of the two innermost data points and if it's not if it's odd then the first quartile will be the middle point of that lower half of the data set and the same for the third quartile. So here for example we've got 10 numbers they just so happen to be the numbers 1 to 10 but it's important that you remember we're not necessarily just talking about you know their positions we're talking about what the data's actual values are. So we go is the is n even? Yes it is. So then the median will be the average of these two values. Okay we can see that's 5.5 and then we exclude the median here. So 5.5 is not an actual data point. So we're excluding it and we only look at this lower half. We get the median of that. Now there's five data points. So the median will be the middle point three. That's our first quarter. We can see that here. And then when we look at the upper half of the data, it's odd. So it'll be the innermost, the middle point of the upper half. And so then we go eight. So that's the process. See by hand is us manually calculating it. I have included these other three here just to show you that depending on the software you use, they give you different answers. So you can see for all there's this software called SPSS, Microsoft Excel, and then another one called Jamovi. All of them find the median to be the same value. It's generally just in the first and the third quartile that it changes and it's based on whether they exclude or include. Some of them don't do it. They use a formula to find the position and all sorts of different ways of calculating it. Important to remember that if we request that you calculate something manually or by hand that you use this. this process and that's what we'll be expecting. This one's where our data set starts being with being odd so the Second quartile, the median will be that middle 0.6. But then we exclude that, okay, to try and find our Q1 and Q3. And so we end up with pretty much the same answers. So we've got five in the third, sorry, in the lower half of the data. So three is the middle value. And then we've got these five numbers in the upper. half of our data and so then this becomes our third quartile. You can see in this case so back here SPSS didn't find the first and third quartile to be the same as our hand calculations but in this instance it did okay whereas Excel and Jamovi found them to be different to our hand calculations. So importantly, just remember, if we're asking you to calculate it manually, that that's how you're going to do it. Now, this is another measure of variation that we haven't talked about before, because we needed to introduce the concept of quartiles. But another measure of variation is your interquartile range. And so what we're doing here is we're taking your third and your first quartile and getting the difference between them and that then gives you the middle 50 percent of your data how variable that is and you'd imagine that if your data is more variable then that interquartile range gets bigger less variable gets smaller and the cool thing about it is it's less susceptible to outliers in skewed distributions because it is only looking at that middle middle 50 percent and just like the median was susceptible to it was less susceptible or not susceptible to outliers it's the same with all of your quartiles because they're more measures of position and they don't incorporate all the values of the data they they are robust to those outliers Now we have this other concept called the five number summary. So the five numbers in the five number summary include your three quartiles, Q1, Q2, Q3, and the fifth numbers are your minimum and your maximum. So they're giving you a measure of central tendency, and you can get your interquartile range from these two, gives you a measure of variability. And even your minimum and maximum can also give you your range, which will, again, give you another measure of variability. So, you know, you can get quite a bit of information from this five-number summary. Now, what we often do is we get these five numbers and then we translate it into visual representation. And you have probably heard of these as box plots or box and whisker plots. And so this is how... they look. Basically you have this box that represents the lower end goes to Q1, whatever that value is. The upper one goes to the third quartile. We have a line that goes through the box at the position of the second quartile and that can be anywhere between those those two values. It'll never, that line will never be outside of the box because necessarily the second quartile has to be between the the first and the third but it can be it can be down close towards quartile one it can be up here or it can be smack bang in the middle of the box it depends on the distribution itself so that's the box then the whisker part is that from from the the boxes we have this this line and then a boundary and that's called the whisker you and the upper whisker goes to the maximum and the lower whisker extends down to the value of the minimum and we draw this like this is just a box in isolation but we would actually draw it on a on a scale and draw that to scale okay and then obviously the length of the box represents the interquartile range So some people when they draw hand they draw it like this or you might draw it vertically. Software draw different ways and some of them do both. I know SPSS prefers this way. Some other software I've used defaults at this and then you can add some instructions and it'll change. But they both represent the same thing. When it's vertical, you're basically, depending which way the camera shows this, but you rotate it 90 degrees. And so what at the left is the minimum, it goes down to the bottom and then this goes up to the top and you can interpret it the exact same way. Now, box plots are pretty cool. They are... Like histograms can show distributions, box plots can show distributions. They're often the preferred way when you've got skewed distributions because of how they show the median and Q1 and Q3, which are not susceptible to outliers. They are often the preferred way of representing skewed distributions. But you can see based on the shape of... what that that box plot looks like you can get an idea of the symmetry of um or the skewness of the distribution so you can see here this is this is a symmetrical distribution at least roughly and it doesn't have to be you know millimeter to millimeter exactly symmetrical even if it's approximately symmetrical you'll see what happens is that the whiskers are approximately the same length your median is in the middle-ish part of that that box and you can see it's quite symmetrically shaped whereas here with a positive skew like we had in the When we're talking about skewness we had those those distribution pictures and they are they they're sort of what histograms look like. So when you've got positive skew you can see that this whisker is is long in the positive direction. We also see that the median tends to be further down towards the lower values. Okay so remember the body of that distribution where most of the the the data is centered the measure of central tendency tends to be towards the lower end away from where that tail is and sort of the opposite for negative skew you see the the long whisker down towards the negative tail and you also see this um the median tending to be up towards the upper tail now when our box plots are skewed we can adjust them to indicate the presence of these skewed, sorry, the outliers. Okay. And so we call these, when it shows the presence of these outliers, we call them modified box plots. So this distribution we looked at on the slide just before, you can see there's this long tail. What we can do is use these modified box plots that actually go based on these formulae and I'll talk about them in a second. Based on those, these are considered these four values one, two, three, four. Four numbers in our data set were considered outliers and not typical for our data set and so then rather than the whisker extending to our minimum it goes up to a certain point and then everything else is considered an outlier. Now, Different software will show them different ways. The one we often use, SPSS, uses an asterisk to show an extreme outlier and then just an open circle to show an outlier. Okay. But just really importantly, remember that the whisker then does not represent a minimum and a maximum. Okay. So this... this is not now the minimum. This is still the minimum. It's still the smallest value, but it's now classified as an outlier. Okay. Um, and then what these, these whiskers extend to are these, uh, we call them fences here. So the, the, um, rule that we use is that any value, so X, any value in your data set that is less than The value of your Q1, so this number here, minus 1.5 times your interquartile range, whatever that's equal to, any number that is smaller than that will be considered an outlier. Okay, and in terms of the upper limit, any number that's bigger than our third quartile plus 1.5 times our interquartile range, that's then an outlier in the positive direction. It then goes and classifies extreme outliers as sort of the next step. So it's similar in its format, Q1 minus, Q3 plus, all of that. But instead of us only doing 1.5 times the interquartile range, it's 3 times the interquartile range. Okay. And so this value, you know, there'd be a number, whatever value in the data set this equals. Okay. say it's 20, it'll say any number that's less than 20. Now, this might be 15. Okay, where this whisker is going to is not necessarily the 20, it'll be the next closest value to that 20 that's above it. Okay, so this could be this whisker could extend to that boundary 20. If there is a value of 20 in your data set, or it could be 21. It'll be the next closest one that's above that boundary. Okay, all right. So if we look at this data set that we were using for calculating our measures of central tendency and measures of variability, okay, you can see here minimum 18, maximum is 22. Our median, because n is even, it was 20.5. We did that way back in measures of central tendency. The new ones that we're calculating here is our first and third quartile. So we look at these four numbers and our first quartile will be the average of the middle two, 20 and 20, gives us 20. And then to get third quartile, we exclude the median and it's the average of 21 and 21, which is 21. Okay, and then what we can do is translate this into a box plot. Now this one I've made with a software called SPSS and you can see maximum is our upper whisker, our third quartile is 21, our median of 20.5 is halfway between 20 and 21, so 20.5 and then Q3 is 21. The weird thing about this one is that you've got no whisker down here and that's because it's considering so our first quartiles are 20. The next lowest value 18 has been calculated to be an outlier and I can tell that because it's shown as a circle and so where that whisker you know had there say been 19. here between these two then that that whisker may have um extended down to the 19 and then the 18 would be the outlier okay but here there's nothing else for that whisker to extend to so that's why it looks like that now remember we had that instance where we made this 22 we made it um 65 now that's what this would look like because that 65 would be considered you um a an extreme outlier and so that's what uh this one represents here so again that upper whisker um has not come come up okay and it's gone to the next closest one which would just be the 21 so it's also not going to have an upper whisker like this doesn't have a lower whisker it's only got the box and no lower whisker no upper whisker and it just goes to extreme outlier and outlier notation