Understanding Measures of Dispersion in Statistics

So we're now in section 3.2. This is a table of contents for chapter 3. What we covered before in 3.1 was something called measures of central tendency. What's the middle of a distribution? Okay, now we're in 3.2. We're covering something called measures of dispersion. Again, we're in section 3.2. covering something called measures of dispersion. But before we can talk about dispersion, we need to start off with what a distribution is. The formal definition of a distribution is beyond the scope of this class, but that doesn't really matter. Let me just show you something in Wikipedia. So if you go to Wikipedia and you search for something called probability distribution, you'll see that this is a fairly complicated topic, and it's obviously much... beyond the scope of the course. Look right here, some of this stuff. But if you've had like math, say beyond calculus, maybe try reading this, but for everyone else, don't really. Again, the formal definition of a distribution is way beyond the scope of this class, but that doesn't really matter at all. For our purposes, all we need to know is that distribution is just a collection of data values that form a population. And we also include the related information about the population that describes how the data values are arranged. So a distribution is just a collection of data values and the information that describes how these data values are arranged. The three most important details about a distribution are, if we draw a histogram or some other graph based on the data values, What's the basic shape? Is it bell-shaped? Is it symmetric? Is it skewed? Is it uniform? Etc. The second important question we ask about a distribution is, where is the middle? Where is it centered? What is the mean? What is the median? Is there a mode? What is the mode? Etc. 3.2 deals with question 3. How spread out are the data values? Are most of the data values near the mean? Or is there a lot of distance between the data values? Measuring dispersion allows us to answer this third question. We have five objectives. We want to determine the range of a variable. That is just the max minus the minimum. We want to determine the standard deviation of a variable. We want to determine the variance. We want to use something called the empirical rule and we want to use something called Chebyshev's inequality. I like this picture. It's not going to help you get a higher grade on the test or even learn the material better, but I just thought this was an interesting picture. And this is the mathematician that Chebyshev's inequality is named after. And he lived in the 19th century and he was considered to be one of the founding fathers of Russian mathematics. So this is just a brief review from 3.1. We have the mean, we have the median, and the mode. Two most important measures of central tendency for our purposes for the rest of the quarter will be using the mean and the median. We won't really talk about the mode much more in the class, but again, the mean and the median are important, and if you look at this chart, when are they used? You use the mean as your first choice. It's the best measure of center, but the distribution has to be roughly symmetric. If your distribution is skewed to the left or to the right, then you switch over to the median as your measure of center. And what I want to talk about now is this column called interpretation, and I want to compare and contrast on a whiteboard the difference between the mean and the median. We're going to first talk about the media and I'm going to draw a number line. And so my first data value is here. My second data value is here. Third data value. Fourth data value. Fifth data value. Sixth, seventh, eighth, ninth. So first, second, third. 4th, 5th, 6th, 7th, 8th, 9th, and 10th. Because we have an even number of data values, the median is going to be between 5 and 6. We don't know, I'm not saying this is the number 5, this is my 5th data value, and this is my 6th data value. Okay? But the median falls in between the 5th and the 6th data value because I have an even number. So I have 50% of my data values on this side. and I have 50% of my data values on this side. Now the beautiful thing about the median is it's resistant to extreme scores, and let me show you what I mean by that. Let me change this to red. So I'm going to extend this out this way. Suppose I move the tenth one all the way out here, okay, and it's no longer there. It doesn't change the median at all because all we're doing is dividing the dots 50% below the median. 50% above the median. Even if I went way, way out, you know what I mean, extending this out to like a thousand more units out to the right, it wouldn't change the value of the median because the median is resistant to extreme scores. Extreme scores don't significantly alter the value of the median. Okay, now this is going to be in contrast to the mean and that's what we're going to look at now. Now we're going to talk about the mean. And this, put right here, make this 0. And it matters what the numbers are with the mean. So here's the value 10. And this is my first data value. And here's the value 15. And this is my second data value. And this is 50 out here. And that's my third data value. I'm only going to have 3. And I want to talk about when they reference center of gravity, what do they mean? Well, the mean for these three numbers is going to be 25. If you add up 10 plus 15 plus 50, that equals 75. Divide that by 3, and you get the mean, 25. What that means is this balances in how far... The dot is away from the mean matters above when we were talking about the median because it's resistant. It doesn't matter what its position. Even if we make the 10th value 55 billion, it doesn't change what the median is. It still falls between the 5th and the 6th data point. But here, the actual value of the dot matters. So if I move this out here to 100, then it's going to change the balancing point and my new mean is going to be like 41. So extreme scores strongly affect the position of the mean because this is the balancing point and the further out it goes as an extreme score the more it influences the value of the mean. So pause this the video on the slide and read this over and see if it makes sense to you but that's the review from yesterday. So I want to talk about symbols. We have six symbols here. The bottom two are new, but they're the topic of what we're going to study today in 3.2. But capital N is population size, little n is sample size, mu, the Greek letter, is population mean, x-bar is sample mean. Now today we're going to talk about something called standard deviation, and standard deviation measures spread. And the symbol we have for standard deviation for the population is the Greek letter sigma. Remember we had uppercase sigma. That was a command to sum. But this symbol that you're seeing to the left of population standard deviation is the Greek letter, but it's lowercase sigma. Then for sample standard deviation, we just use lowercase s. We'll go back to that later. This is just a review of what the objectives are before we actually get started with them. We need to determine the range, standard deviation, variance, we need to learn how to use the empirical rule, and we need to talk briefly about something called Chebyshev's inequality. This is also something that I'll mention two more times. I'll mention it at the end of this particular video, and I'll also mention it at the beginning. of the 3.2 homework help video. And what that means, what I need to have you understand is, when you're using StatCrunch, what you're going to do, like you have been in Chapter 3, you're going to go to Stat, Summary Stats, then Columns. And then you're going to go down in this region and choose what it is that you want. Yesterday you were choosing Mean, Median, and Mode, alright? But now we're going to have to compute something called Standard Deviation and Variance. Please pay attention to what I'm going to say or you'll have difficulty doing the homework. When we were computing the mean, the formula for the population mean and the formula for the sample mean was essentially identical, and so you didn't need to worry about that distinction when you were using StatCrunch. But the formulas for standard deviation are significantly different between sample and population, and so you have to choose the correct one in StatCrunch. Look at this slide. Whenever you're working with the population, you use something called unadjusted standard deviation and unadjusted variance, and anytime you're working with samples, you just use variance and standard deviation. Okay, what does that mean? These formulas won't make sense to you right now, but the top one, sigma, is the formula for population standard deviation, and the bottom one, little s, is the formula for sample standard deviation. Population, we divide by n, population size. But with a sample, we don't use the sample size. We use the sample size minus 1. And so they're calling this capital N here, associated with sigma, the unadjusted version, because there's no minus 1, the unadjusted version. Whereas for samples, because you're subtracting, that's the adjusted. And they don't call it adjusted. They just call it simply standard deviation and variance. Whatever you do, make sure you understand this. We're going to talk about our first example, or excuse me, our first objective, standard deviation. But first, what we want to do is talk about the context. And so we're going to have two universities and two data sets. And they want us to draw a histogram of each. And the purpose of this is we're going to present two separate data sets with roughly the same shape and with the same mean, but they're going to have difference. amount of dispersion or spread. So here are the IQ scores for 100 students at University A and here are the IQ scores for 100 students at University B. What we did was is we did histograms of both of them. Notice that the one for University A is more spread out and the one for University B is more compact. Both of these distributions have a mean of 100. Okay, so they both are relatively bell-shaped. Remember there's three aspects of a distribution that we're going to focus on. It's shape, it's center, and it's spread, spread or diversion. Shape, center, spread. Both these have roughly the same shape. They're bell shaped and symmetric. Both these have exactly the same mean of 100, but they differ in terms of University A's data values are more spread out and University B's data values are tighter. You have more very low scores in University A and also more high scores. You have scores below 70. You have scores above 140, whereas everything in University B is concentrated between 80 and 120. Do you see this where it says, in other words, this is just the original slideshow. paragraph describing what happened. I thought I described it, but you can stop here and read this if you want. But all it does is say that these two have the same basic shape, the same mean, but they have different amounts of dispersion or spread. So the very first objective we have is determining the range of a variable. And it's the easiest way to measure spread, not the best way, but the easiest way. And all you do is you look at the largest value minus the smallest value. There's a quick example here. You can pause the slide and look at it, but they identify the maximum value, they identify the minimum value, and subtract range. We're going to talk about something called standard deviation, and this is the main way of measuring spread in a data set. Standard deviation. Whenever you use the mean to measure the center, you will always use standard deviation. To measure spread, the mean always goes with standard deviation. That's because if you look at the formula for it, it involves the mean. So this looks a little bit difficult, and we'll explain it, but there's two things I want to make clear. When we measure spread, we're measuring how much separation there is between the data values and the mean. And so we have to look at every single data value that we have. Here's the first one, x1. Here's the second one, x2. And we go all the way out to x subscript capital N because That's the population size, so if we have 100 values, this would be the 100th x. For each one of those, we measure its distance away from the mean. Distance away from the mean, and the only reason we're squaring this is because some of these numbers will be positive, and some of them will be negative. If the x is bigger than the mean, then it will be positive. If the x is below the mean, the subtraction will be negative. We want all these numbers to be positive, so we're squaring. Remember that negative 2 squared is positive 4, negative 3 squared is positive 9. When you square something, you always eliminate the negative value. So, look down here in this formula where I'm at. You sum up all those differences, but remember you squared them to make them positive. Then you divide by capital N, the number of observations you have, and then finally you take the square root. What I want you to hear is that don't get too excited about this because this will be done almost exclusively using technology. It's important that you know what standard deviation does. It measures spread away from the mean, right, away from the mean. And that's the reason why you take each data value and find out how far it is away from the mean. This is just a quick example. We're not going to do this by hand ever, but it's important that you know what standard deviation does. and it measures the dispersion or spread away from the mean. So our very first job in computing standard deviation is compute how far each individual value is away from the mean. So these are my numbers over here in this left-hand column. The mean was computed to be 79, and then I just go down the list and I compute the distance. If it's negative, it's below the mean. If it's positive, it's above the mean. Then I square. all those values because I want them to be positive. Then I add them all up. Remember, capital sigma means add, summation sign. So I add up all those numbers. I get 964. Then I divide by how many observations I had, which was 10, and I get 96.4. This is getting ahead of ourselves a little bit, but this number that you see right here, this 96.4, is something called variance. Variance is just standard deviation without the square root sign. So in order to go from this 96.4, we just put a square root sign around our formula, square root sign around 96.4, and the answer is 9.8. And that's a measurement of spread. The higher this number is, the more spread out the data values are, the more spread out the dispersion is. The smaller that number, the tighter your distribution is, and there's less spread away. from the mean. You can do the same thing with samples. So you first start off by measuring how far each data value is away from the mean, squaring that number in order to make them all positive. Then you divide by n minus 1, not n, n minus 1, and then you take the square root. The reason you have this minus 1 here is this is relatively mathematical. But when you have a fraction and you make the denominator smaller, you have a fraction. Well, let me just show you. So I have a fraction like, say, 3 sevenths. And then I have another fraction, 3 sixths. This is bigger. If you have the numerator constant and you decrease the size of the denominator, then your fraction gets bigger. Look at this. One half, right, and then one third. Which one's bigger? Well, one half is bigger. If the denominator gets smaller, the denominator gets smaller, the fraction gets bigger. Look at this. One fourth, which is bigger? One third is bigger because as you decrease the size of the denominator, if you hold the numerator constant, the fraction gets bigger. So, If I have, say, 1 divided by 7, and then I have 1 divided by 7 minus 1, this 1 is larger. So what happens is, is we want to use S, sample standard deviation, to estimate population standard deviation. But the problem is, is that if we didn't do that minus 1 in the denominator, this S would be constantly too small. It would be an underestimate of the population standard deviation. So we're just trying to bump it up and make it slightly bigger, and that's why we subtract that 1 to make the S. a better estimate of sigma. So here's another quick example of computing standard deviation by hand. Again, we have one, two, three, four sample values. We compute the sample mean to be 73.75, and for each data value we find its distance away from the mean. Notice that some are positive and some are negative. Look at this right here. If we just sum those up without squaring them, it'll go to zero every time because there's a symmetry involved. These negative values and these positive values will always cancel out. That's why they have to be squared. So we have the sum of the squares of the differences. So now we have the sum of the squares of the differences divided by sample size minus 1. And that gives us 128.25. And so sample standard deviation equals the square root of 128.25. And... we get 11.3. So now what they want us to do is compare standard deviations and we want to talk about using technology. Look at these data values for University A and University B. Do you realize how long it would take to compute standard deviation for those two by hand? It would take so long. But that's the beauty of technology. It's not that difficult when you use technology. So using technology. They computed the standard deviation. See where my cursor is right now. And so University A is 16.1, rounded to one decimal place, and University B, the standard deviation is 8.4. So that means that University A should have more spread. Remember, the larger the standard deviation, the more spread, and that's what we're seeing in the graphs. University A is more spread out. University B is more compact. It's a tighter distribution. Now we're going to talk about variance. That's objective three. Super easy. It's going to take about five seconds. If you eliminate the square root, you have something called variance. If you eliminate the square root, you have something called variance. That's the only difference. It doesn't even get a new symbol. Just sigma and sigma squared, standard deviation, variance. Sample standard deviation, sample variance. We're now on something called the empirical rule. This is our objective 4. The main thing that you want to understand before we go any further is that this rule only applies the bell-shaped distributions. You have to have something, whether it's a histogram or a graph like this, you have to have something that looks bell-shaped and symmetric. Memorize these numbers 68, 95, 99.7 and what the empirical rule allows us to do is make estimations about distributions that are bell-shaped. It allows us to make estimations about distributions that are roughly bell-shaped. And so what we get is, let me go to the next slide and I'll come back to this in a second. If we look at the center and call the center mu, and then we go up one standard deviation and down one standard deviation, then not... the amount of information in our population, the amount of data values that fall, that lie between minus one standard deviation, minus one standard deviation, and plus one standard deviation, is going to be 68%. Now look at the logic here in this red box. We want this graph to represent 100% of all our numbers. If we know that 68% are in the blue region, then we know that we have 32% left over in the two tails. We divide that 32% by 2, and that means that I have 16% on each side. Okay, 16% in this tail on the right, 16% in this tail on the left. If we go out plus or minus two standard deviations, then I cover approximately 95% of my data values. Mu, look right here where the cursor is, mu. Mu plus two standard deviations, mu minus two standard deviations, and then that blue region in the middle contains 95% of my data values. That means I have 5% left over in the two tails. But if I divide by 2, that means I have 2.5% in each tail. You can do the same thing for 3 standard deviations. If you go out plus or minus 3 standard deviations, then the blue area encompasses 99.7, almost all, but 99.7%, leaving us only 0.3% to divide between 2 tails. We divide that by 2, that means I have 0.15%. and each of those tails separate. Suppose I have a bell-shaped distribution. And suppose the mean is 100. Suppose standard deviation is 10. So if I come over here to 110 and over here to 90, that's plus one standard deviation up and minus one down. Why? Because 100 plus 10 is 110. 100 minus 10 is 90. So that means I have 68% here of my numbers between 90 and 110. And I have 16% here above 110 and I have 16% here below 90. Let's do it again. Same situation. My mean is 100 and my standard deviation is 10. So I come out here to 110, one standard deviation. Now I'm at 120, that's two standard deviations. Here I am at 90, minus one standard deviation. And now I'm at 80, that's minus two. So 80 is minus two standard deviations and 110, 20 is plus two standard deviations. So that means I have approximately 95% of my data values between 80 and 120. How much do I have above 120? Well, I have 2.5%. How much do I have below 80? 2.5%. Try the third example. We're going to go out plus 3. We still have our mean is 100. Our standard deviation is still going to be 10. I go to 110, one standard deviation, 120, and all the way out to 130. That's plus three standard deviations, 90. minus 1, 80 minus 2, 70 minus 3, okay, minus 3 standard deviations, plus 3 standard deviations. That means I have almost all my data, 99.7% between those values, and so that means I have 0.15% above 130 and 0.15% below 70. Pause this slide and look at it. This is what was provided by the publisher. And this is true. Everything on this slide is true, but I think it's confusing the way they've done it. And I like these pictures better. But look this over and see if it makes sense to you. This next example is similar to a homework problem. What you have to understand is the empirical rule is a way of estimating, it's not exact, it's a way of estimating how many of your data values or what percent of your data values will fall between a certain number of standard deviations. But do you see in Part C where it says actual? In Part B they want us to do an estimate using the empirical rule, but in Part C they want us to actually count the values. So we have four parts here and let's address those. So the question in part A was that what do you have between plus or minus three standard deviations? Well, we know that the empirical rule says that when you have plus or minus three standard deviations You're going to get ninety nine point seven percent in the middle And that's all they want you to say here in part A, but then they actually compute the values and so Our standard deviation was given to be 16.1, so they multiply that by 3, and then they get 48.3, and so they add that to 100, and they subtract that away from 100, and so the markers for the plus 3 and the minus 3 standard deviations are 51.7 and 148.3. Now, in part two, they want to know how much of the, what percent of the data values. should fall between 67.8 and 132.2. If the mean is 100 and the standard deviation is 16.1, Then, what that means is 67.8 is two standard deviations below, and 132.2 is two standard deviations above. And so, if we're talking about in the middle between those numbers, then we're talking about 95% of the data values, and that's what we have. Now, in Part C, they want us to take those numbers, the 67.8 and the 13.2, and remember, we estimated. we actually estimated that it should be about 95 of the data then in this part of the problem part c we actually go back to our table and we we would never do this and this is all that would be be done using technology we figure out how many of the values are between 67.8 132.2 and we see that 96 out of 100 of them are which is 96 which is very closely in agreement with our estimated 95%. But the main thing you want to understand between B and C, compare and contrast, B is an approximation using the empirical rule and C you actually count the values to see how many or what percent you have between the marks 67.8 and 132.2. Part D, they want to know about the tail above 132.2 but since 132.2 is two standard deviations above the mean we're over here okay and so for two standard deviations above the mean that means we have two and a half percent in that tail above that value they go oh they look at the picture where it goes 2.35 plus 0.15 is two and a half but it's easy enough just to go straight to the two and a half chevy chevs is not as important as the empirical it's it's the same thing in the sense that it's a tool for estimating, right? But Chebyshev's is a one size fits all. There are no restrictions on which distributions you can use it with, but it's not as precise. It's a one size fits all, whereas the empirical rule is very precise, but it only works for bell-shaped and symmetric distributions. Okay, and so you have this formula 1 minus 1 over k squared, where k represents how many standard deviations. So if you're talking about two standard deviations, you'd get 1 minus 1 over 2 squared. Let's look at an example. Determine the minimum percentage of students who have IQ scores within three standard deviations of the mean. So when we do that, we just, they said three standard deviations, so we do 1 minus 1 over 3 squared. Next, we're going to give us 88.9%. Notice that if we were using the empirical rule, we would get a 99.7, but Chebyshev's is a one-size-fits-all, so it's not as precise, and it's only saying we should see at least, which means this number, 88.9% or more. In Part B, they want us to do an estimate. between 67.8 and 132.2, but when we were doing the empirical rule, we saw that that's plus or minus two standard deviations. And so if I make my k2 right here, then I get 1 minus 1 over 4 or 75%. But when I actually counted this, remember the last example of the empirical rule, 96% of my scores were between those two numbers. But Chebyshev's is only saying that they're predicting at least 75 which means this number or more again chevy chevs is not as important as the empirical rule we need to cover it because it's in the objectives and its benefit over the empirical rule is that it's a one size fits all and there are no restrictions on which distributions it can be used with but it's not as accurate as the empirical rule But the drawback of the empirical rule is that it can only be used with bell-shaped distribution. We were supposed to talk about the range. That's the max minus the min. We were supposed to talk about the standard deviation. Remember that that measures the spread of the distribution. Then variance is just the square of standard deviation, so it's no big deal at all. The empirical rule, remember, is called the 68, 95, 99. 0.7 rule, which corresponds to plus or minus 1, plus or minus 2, plus or minus 3 standard deviations. But the main thing to remember besides those numbers is that the empirical rule The empirical rule only applies to bell-shaped distributions. And then we talked about Chebyshev's inequality and the benefit of Chebyshev's. It works for any distribution, but it's clearly not as precise as the empirical rule. There's your symbols again. You're going to see this in the homework. It's a homework question, but make sure that you start keeping track of these because there's not that many, right? But the symbols that you need to memorize are... Important and you need to have them clear in your head This is also super important when you're using stat crunch for the homework in 3.2. Make sure you distinguish between population standard deviation and population sample standard deviation If you're working with a population you have to use something in stat crunch called unadjusted Whereas if you're working with samples just use regular standard deviation

Again, we're in section 3.2. covering something called measures of dispersion. But before we can talk about dispersion, we need to start off with what a distribution is.

The formal definition of a distribution is beyond the scope of this class, but that doesn't really matter. Let me just show you something in Wikipedia. So if you go to Wikipedia and you search for something called probability distribution, you'll see that this is a fairly complicated topic, and it's obviously much...

beyond the scope of the course. Look right here, some of this stuff. But if you've had like math, say beyond calculus, maybe try reading this, but for everyone else, don't really. Again, the formal definition of a distribution is way beyond the scope of this class, but that doesn't really matter at all. For our purposes, all we need to know is that distribution is just a collection of data values that form a population.

And we also include the related information about the population that describes how the data values are arranged. So a distribution is just a collection of data values and the information that describes how these data values are arranged. The three most important details about a distribution are, if we draw a histogram or some other graph based on the data values, What's the basic shape?

Is it bell-shaped? Is it symmetric? Is it skewed?

Is it uniform? Etc. The second important question we ask about a distribution is, where is the middle? Where is it centered?

What is the mean? What is the median? Is there a mode?

What is the mode? Etc. 3.2 deals with question 3. How spread out are the data values? Are most of the data values near the mean? Or is there a lot of distance between the data values? Measuring dispersion allows us to answer this third question.

We have five objectives. We want to determine the range of a variable. That is just the max minus the minimum. We want to determine the standard deviation of a variable.

We want to determine the variance. We want to use something called the empirical rule and we want to use something called Chebyshev's inequality. I like this picture. It's not going to help you get a higher grade on the test or even learn the material better, but I just thought this was an interesting picture. And this is the mathematician that Chebyshev's inequality is named after.

And he lived in the 19th century and he was considered to be one of the founding fathers of Russian mathematics. So this is just a brief review from 3.1. We have the mean, we have the median, and the mode.

Two most important measures of central tendency for our purposes for the rest of the quarter will be using the mean and the median. We won't really talk about the mode much more in the class, but again, the mean and the median are important, and if you look at this chart, when are they used? You use the mean as your first choice.

It's the best measure of center, but the distribution has to be roughly symmetric. If your distribution is skewed to the left or to the right, then you switch over to the median as your measure of center. And what I want to talk about now is this column called interpretation, and I want to compare and contrast on a whiteboard the difference between the mean and the median.

We're going to first talk about the media and I'm going to draw a number line. And so my first data value is here. My second data value is here. Third data value. Fourth data value.

Fifth data value. Sixth, seventh, eighth, ninth. So first, second, third. 4th, 5th, 6th, 7th, 8th, 9th, and 10th. Because we have an even number of data values, the median is going to be between 5 and 6. We don't know, I'm not saying this is the number 5, this is my 5th data value, and this is my 6th data value.

Okay? But the median falls in between the 5th and the 6th data value because I have an even number. So I have 50% of my data values on this side.

and I have 50% of my data values on this side. Now the beautiful thing about the median is it's resistant to extreme scores, and let me show you what I mean by that. Let me change this to red. So I'm going to extend this out this way.

Suppose I move the tenth one all the way out here, okay, and it's no longer there. It doesn't change the median at all because all we're doing is dividing the dots 50% below the median. 50% above the median. Even if I went way, way out, you know what I mean, extending this out to like a thousand more units out to the right, it wouldn't change the value of the median because the median is resistant to extreme scores.

Extreme scores don't significantly alter the value of the median. Okay, now this is going to be in contrast to the mean and that's what we're going to look at now. Now we're going to talk about the mean. And this, put right here, make this 0. And it matters what the numbers are with the mean.

So here's the value 10. And this is my first data value. And here's the value 15. And this is my second data value. And this is 50 out here.

And that's my third data value. I'm only going to have 3. And I want to talk about when they reference center of gravity, what do they mean? Well, the mean for these three numbers is going to be 25. If you add up 10 plus 15 plus 50, that equals 75. Divide that by 3, and you get the mean, 25. What that means is this balances in how far... The dot is away from the mean matters above when we were talking about the median because it's resistant. It doesn't matter what its position.

Even if we make the 10th value 55 billion, it doesn't change what the median is. It still falls between the 5th and the 6th data point. But here, the actual value of the dot matters. So if I move this out here to 100, then it's going to change the balancing point and my new mean is going to be like 41. So extreme scores strongly affect the position of the mean because this is the balancing point and the further out it goes as an extreme score the more it influences the value of the mean. So pause this the video on the slide and read this over and see if it makes sense to you but that's the review from yesterday.

So I want to talk about symbols. We have six symbols here. The bottom two are new, but they're the topic of what we're going to study today in 3.2. But capital N is population size, little n is sample size, mu, the Greek letter, is population mean, x-bar is sample mean.

Now today we're going to talk about something called standard deviation, and standard deviation measures spread. And the symbol we have for standard deviation for the population is the Greek letter sigma. Remember we had uppercase sigma. That was a command to sum. But this symbol that you're seeing to the left of population standard deviation is the Greek letter, but it's lowercase sigma.

Then for sample standard deviation, we just use lowercase s. We'll go back to that later. This is just a review of what the objectives are before we actually get started with them. We need to determine the range, standard deviation, variance, we need to learn how to use the empirical rule, and we need to talk briefly about something called Chebyshev's inequality. This is also something that I'll mention two more times.

I'll mention it at the end of this particular video, and I'll also mention it at the beginning. of the 3.2 homework help video. And what that means, what I need to have you understand is, when you're using StatCrunch, what you're going to do, like you have been in Chapter 3, you're going to go to Stat, Summary Stats, then Columns.

And then you're going to go down in this region and choose what it is that you want. Yesterday you were choosing Mean, Median, and Mode, alright? But now we're going to have to compute something called Standard Deviation and Variance. Please pay attention to what I'm going to say or you'll have difficulty doing the homework.

When we were computing the mean, the formula for the population mean and the formula for the sample mean was essentially identical, and so you didn't need to worry about that distinction when you were using StatCrunch. But the formulas for standard deviation are significantly different between sample and population, and so you have to choose the correct one in StatCrunch. Look at this slide.

Whenever you're working with the population, you use something called unadjusted standard deviation and unadjusted variance, and anytime you're working with samples, you just use variance and standard deviation. Okay, what does that mean? These formulas won't make sense to you right now, but the top one, sigma, is the formula for population standard deviation, and the bottom one, little s, is the formula for sample standard deviation. Population, we divide by n, population size.

But with a sample, we don't use the sample size. We use the sample size minus 1. And so they're calling this capital N here, associated with sigma, the unadjusted version, because there's no minus 1, the unadjusted version. Whereas for samples, because you're subtracting, that's the adjusted. And they don't call it adjusted.

They just call it simply standard deviation and variance. Whatever you do, make sure you understand this. We're going to talk about our first example, or excuse me, our first objective, standard deviation.

But first, what we want to do is talk about the context. And so we're going to have two universities and two data sets. And they want us to draw a histogram of each.

And the purpose of this is we're going to present two separate data sets with roughly the same shape and with the same mean, but they're going to have difference. amount of dispersion or spread. So here are the IQ scores for 100 students at University A and here are the IQ scores for 100 students at University B. What we did was is we did histograms of both of them. Notice that the one for University A is more spread out and the one for University B is more compact. Both of these distributions have a mean of 100. Okay, so they both are relatively bell-shaped.

Remember there's three aspects of a distribution that we're going to focus on. It's shape, it's center, and it's spread, spread or diversion. Shape, center, spread.

Both these have roughly the same shape. They're bell shaped and symmetric. Both these have exactly the same mean of 100, but they differ in terms of University A's data values are more spread out and University B's data values are tighter.

You have more very low scores in University A and also more high scores. You have scores below 70. You have scores above 140, whereas everything in University B is concentrated between 80 and 120. Do you see this where it says, in other words, this is just the original slideshow. paragraph describing what happened.

I thought I described it, but you can stop here and read this if you want. But all it does is say that these two have the same basic shape, the same mean, but they have different amounts of dispersion or spread. So the very first objective we have is determining the range of a variable. And it's the easiest way to measure spread, not the best way, but the easiest way. And all you do is you look at the largest value minus the smallest value.

There's a quick example here. You can pause the slide and look at it, but they identify the maximum value, they identify the minimum value, and subtract range. We're going to talk about something called standard deviation, and this is the main way of measuring spread in a data set.

Standard deviation. Whenever you use the mean to measure the center, you will always use standard deviation. To measure spread, the mean always goes with standard deviation.

That's because if you look at the formula for it, it involves the mean. So this looks a little bit difficult, and we'll explain it, but there's two things I want to make clear. When we measure spread, we're measuring how much separation there is between the data values and the mean.

And so we have to look at every single data value that we have. Here's the first one, x1. Here's the second one, x2.

And we go all the way out to x subscript capital N because That's the population size, so if we have 100 values, this would be the 100th x. For each one of those, we measure its distance away from the mean. Distance away from the mean, and the only reason we're squaring this is because some of these numbers will be positive, and some of them will be negative.

If the x is bigger than the mean, then it will be positive. If the x is below the mean, the subtraction will be negative. We want all these numbers to be positive, so we're squaring. Remember that negative 2 squared is positive 4, negative 3 squared is positive 9. When you square something, you always eliminate the negative value. So, look down here in this formula where I'm at.

You sum up all those differences, but remember you squared them to make them positive. Then you divide by capital N, the number of observations you have, and then finally you take the square root. What I want you to hear is that don't get too excited about this because this will be done almost exclusively using technology.

It's important that you know what standard deviation does. It measures spread away from the mean, right, away from the mean. And that's the reason why you take each data value and find out how far it is away from the mean. This is just a quick example.

We're not going to do this by hand ever, but it's important that you know what standard deviation does. and it measures the dispersion or spread away from the mean. So our very first job in computing standard deviation is compute how far each individual value is away from the mean.

So these are my numbers over here in this left-hand column. The mean was computed to be 79, and then I just go down the list and I compute the distance. If it's negative, it's below the mean. If it's positive, it's above the mean.

Then I square. all those values because I want them to be positive. Then I add them all up. Remember, capital sigma means add, summation sign. So I add up all those numbers.

I get 964. Then I divide by how many observations I had, which was 10, and I get 96.4. This is getting ahead of ourselves a little bit, but this number that you see right here, this 96.4, is something called variance. Variance is just standard deviation without the square root sign.

So in order to go from this 96.4, we just put a square root sign around our formula, square root sign around 96.4, and the answer is 9.8. And that's a measurement of spread. The higher this number is, the more spread out the data values are, the more spread out the dispersion is.

The smaller that number, the tighter your distribution is, and there's less spread away. from the mean. You can do the same thing with samples. So you first start off by measuring how far each data value is away from the mean, squaring that number in order to make them all positive.

Then you divide by n minus 1, not n, n minus 1, and then you take the square root. The reason you have this minus 1 here is this is relatively mathematical. But when you have a fraction and you make the denominator smaller, you have a fraction.

Well, let me just show you. So I have a fraction like, say, 3 sevenths. And then I have another fraction, 3 sixths.

This is bigger. If you have the numerator constant and you decrease the size of the denominator, then your fraction gets bigger. Look at this. One half, right, and then one third.

Which one's bigger? Well, one half is bigger. If the denominator gets smaller, the denominator gets smaller, the fraction gets bigger. Look at this.

One fourth, which is bigger? One third is bigger because as you decrease the size of the denominator, if you hold the numerator constant, the fraction gets bigger. So, If I have, say, 1 divided by 7, and then I have 1 divided by 7 minus 1, this 1 is larger. So what happens is, is we want to use S, sample standard deviation, to estimate population standard deviation.

But the problem is, is that if we didn't do that minus 1 in the denominator, this S would be constantly too small. It would be an underestimate of the population standard deviation. So we're just trying to bump it up and make it slightly bigger, and that's why we subtract that 1 to make the S. a better estimate of sigma.

So here's another quick example of computing standard deviation by hand. Again, we have one, two, three, four sample values. We compute the sample mean to be 73.75, and for each data value we find its distance away from the mean.

Notice that some are positive and some are negative. Look at this right here. If we just sum those up without squaring them, it'll go to zero every time because there's a symmetry involved. These negative values and these positive values will always cancel out. That's why they have to be squared.

So we have the sum of the squares of the differences. So now we have the sum of the squares of the differences divided by sample size minus 1. And that gives us 128.25. And so sample standard deviation equals the square root of 128.25. And...

we get 11.3. So now what they want us to do is compare standard deviations and we want to talk about using technology. Look at these data values for University A and University B. Do you realize how long it would take to compute standard deviation for those two by hand?

It would take so long. But that's the beauty of technology. It's not that difficult when you use technology.

So using technology. They computed the standard deviation. See where my cursor is right now.

And so University A is 16.1, rounded to one decimal place, and University B, the standard deviation is 8.4. So that means that University A should have more spread. Remember, the larger the standard deviation, the more spread, and that's what we're seeing in the graphs. University A is more spread out. University B is more compact.

It's a tighter distribution. Now we're going to talk about variance. That's objective three. Super easy. It's going to take about five seconds.

If you eliminate the square root, you have something called variance. If you eliminate the square root, you have something called variance. That's the only difference. It doesn't even get a new symbol. Just sigma and sigma squared, standard deviation, variance.

Sample standard deviation, sample variance. We're now on something called the empirical rule. This is our objective 4. The main thing that you want to understand before we go any further is that this rule only applies the bell-shaped distributions.

You have to have something, whether it's a histogram or a graph like this, you have to have something that looks bell-shaped and symmetric. Memorize these numbers 68, 95, 99.7 and what the empirical rule allows us to do is make estimations about distributions that are bell-shaped. It allows us to make estimations about distributions that are roughly bell-shaped. And so what we get is, let me go to the next slide and I'll come back to this in a second. If we look at the center and call the center mu, and then we go up one standard deviation and down one standard deviation, then not...

the amount of information in our population, the amount of data values that fall, that lie between minus one standard deviation, minus one standard deviation, and plus one standard deviation, is going to be 68%. Now look at the logic here in this red box. We want this graph to represent 100% of all our numbers. If we know that 68% are in the blue region, then we know that we have 32% left over in the two tails. We divide that 32% by 2, and that means that I have 16% on each side.

Okay, 16% in this tail on the right, 16% in this tail on the left. If we go out plus or minus two standard deviations, then I cover approximately 95% of my data values. Mu, look right here where the cursor is, mu. Mu plus two standard deviations, mu minus two standard deviations, and then that blue region in the middle contains 95% of my data values. That means I have 5% left over in the two tails.

But if I divide by 2, that means I have 2.5% in each tail. You can do the same thing for 3 standard deviations. If you go out plus or minus 3 standard deviations, then the blue area encompasses 99.7, almost all, but 99.7%, leaving us only 0.3% to divide between 2 tails.

We divide that by 2, that means I have 0.15%. and each of those tails separate. Suppose I have a bell-shaped distribution. And suppose the mean is 100. Suppose standard deviation is 10. So if I come over here to 110 and over here to 90, that's plus one standard deviation up and minus one down. Why?

Because 100 plus 10 is 110. 100 minus 10 is 90. So that means I have 68% here of my numbers between 90 and 110. And I have 16% here above 110 and I have 16% here below 90. Let's do it again. Same situation. My mean is 100 and my standard deviation is 10. So I come out here to 110, one standard deviation. Now I'm at 120, that's two standard deviations.

Here I am at 90, minus one standard deviation. And now I'm at 80, that's minus two. So 80 is minus two standard deviations and 110, 20 is plus two standard deviations. So that means I have approximately 95% of my data values between 80 and 120. How much do I have above 120? Well, I have 2.5%.

How much do I have below 80? 2.5%. Try the third example. We're going to go out plus 3. We still have our mean is 100. Our standard deviation is still going to be 10. I go to 110, one standard deviation, 120, and all the way out to 130. That's plus three standard deviations, 90. minus 1, 80 minus 2, 70 minus 3, okay, minus 3 standard deviations, plus 3 standard deviations. That means I have almost all my data, 99.7% between those values, and so that means I have 0.15% above 130 and 0.15% below 70. Pause this slide and look at it.

This is what was provided by the publisher. And this is true. Everything on this slide is true, but I think it's confusing the way they've done it.

And I like these pictures better. But look this over and see if it makes sense to you. This next example is similar to a homework problem.

What you have to understand is the empirical rule is a way of estimating, it's not exact, it's a way of estimating how many of your data values or what percent of your data values will fall between a certain number of standard deviations. But do you see in Part C where it says actual? In Part B they want us to do an estimate using the empirical rule, but in Part C they want us to actually count the values. So we have four parts here and let's address those.

So the question in part A was that what do you have between plus or minus three standard deviations? Well, we know that the empirical rule says that when you have plus or minus three standard deviations You're going to get ninety nine point seven percent in the middle And that's all they want you to say here in part A, but then they actually compute the values and so Our standard deviation was given to be 16.1, so they multiply that by 3, and then they get 48.3, and so they add that to 100, and they subtract that away from 100, and so the markers for the plus 3 and the minus 3 standard deviations are 51.7 and 148.3. Now, in part two, they want to know how much of the, what percent of the data values. should fall between 67.8 and 132.2. If the mean is 100 and the standard deviation is 16.1, Then, what that means is 67.8 is two standard deviations below, and 132.2 is two standard deviations above.

And so, if we're talking about in the middle between those numbers, then we're talking about 95% of the data values, and that's what we have. Now, in Part C, they want us to take those numbers, the 67.8 and the 13.2, and remember, we estimated. we actually estimated that it should be about 95 of the data then in this part of the problem part c we actually go back to our table and we we would never do this and this is all that would be be done using technology we figure out how many of the values are between 67.8 132.2 and we see that 96 out of 100 of them are which is 96 which is very closely in agreement with our estimated 95%.

But the main thing you want to understand between B and C, compare and contrast, B is an approximation using the empirical rule and C you actually count the values to see how many or what percent you have between the marks 67.8 and 132.2. Part D, they want to know about the tail above 132.2 but since 132.2 is two standard deviations above the mean we're over here okay and so for two standard deviations above the mean that means we have two and a half percent in that tail above that value they go oh they look at the picture where it goes 2.35 plus 0.15 is two and a half but it's easy enough just to go straight to the two and a half chevy chevs is not as important as the empirical it's it's the same thing in the sense that it's a tool for estimating, right? But Chebyshev's is a one size fits all.

There are no restrictions on which distributions you can use it with, but it's not as precise. It's a one size fits all, whereas the empirical rule is very precise, but it only works for bell-shaped and symmetric distributions. Okay, and so you have this formula 1 minus 1 over k squared, where k represents how many standard deviations. So if you're talking about two standard deviations, you'd get 1 minus 1 over 2 squared.

Let's look at an example. Determine the minimum percentage of students who have IQ scores within three standard deviations of the mean. So when we do that, we just, they said three standard deviations, so we do 1 minus 1 over 3 squared. Next, we're going to give us 88.9%. Notice that if we were using the empirical rule, we would get a 99.7, but Chebyshev's is a one-size-fits-all, so it's not as precise, and it's only saying we should see at least, which means this number, 88.9% or more.

In Part B, they want us to do an estimate. between 67.8 and 132.2, but when we were doing the empirical rule, we saw that that's plus or minus two standard deviations. And so if I make my k2 right here, then I get 1 minus 1 over 4 or 75%. But when I actually counted this, remember the last example of the empirical rule, 96% of my scores were between those two numbers.

But Chebyshev's is only saying that they're predicting at least 75 which means this number or more again chevy chevs is not as important as the empirical rule we need to cover it because it's in the objectives and its benefit over the empirical rule is that it's a one size fits all and there are no restrictions on which distributions it can be used with but it's not as accurate as the empirical rule But the drawback of the empirical rule is that it can only be used with bell-shaped distribution. We were supposed to talk about the range. That's the max minus the min. We were supposed to talk about the standard deviation.

Remember that that measures the spread of the distribution. Then variance is just the square of standard deviation, so it's no big deal at all. The empirical rule, remember, is called the 68, 95, 99. 0.7 rule, which corresponds to plus or minus 1, plus or minus 2, plus or minus 3 standard deviations.

But the main thing to remember besides those numbers is that the empirical rule The empirical rule only applies to bell-shaped distributions. And then we talked about Chebyshev's inequality and the benefit of Chebyshev's. It works for any distribution, but it's clearly not as precise as the empirical rule.

There's your symbols again. You're going to see this in the homework. It's a homework question, but make sure that you start keeping track of these because there's not that many, right? But the symbols that you need to memorize are... Important and you need to have them clear in your head This is also super important when you're using stat crunch for the homework in 3.2.

Make sure you distinguish between population standard deviation and population sample standard deviation If you're working with a population you have to use something in stat crunch called unadjusted Whereas if you're working with samples just use regular standard deviation

Transcript for:Understanding Measures of Dispersion in Statistics

Transcript for:
Understanding Measures of Dispersion in Statistics