Transcript for:
Normal Distribution and Empirical Rule

How can I use mean and standard deviation to determine when something's unusual? What we are going to focus on is unimodal and symmetric graphs. Let's remember we gave that a name; we called that normal. What we're going to focus on are unimodal and symmetric graphs, much like what we have here in blue. What we're going to do is remember that when our graph is symmetric and unimodal, we develop the idea of mean and standard deviation. Mean and standard deviation help us identify what's the center, what's the typical value, and how our data values spread out from around it. We're going to use these three ideas to all come together to develop what we call the empirical rule. The empirical rule, combined with the z-score, will help us do a very important task: identifying unusual observations. Remember when we were in chapter two, and we would look at a histogram, we would look at a Dot Plot, and we would ask ourselves, "Is that really unusual?" It felt very handwavy. We weren't sure; sometimes it was very clear, other times it was a little questionable. So what we're going to do today is use our friends mean and standard deviation to develop the empirical rule, which will ultimately lead to the z-score formula. That formula, that specific number, that line in the sand, is going to tell us, "Alright, you are beyond the point of no return; you are considered unusual." Let's do it, guys! Empirical rule, it all is going to begin here. For starters, when it comes to looking at the empirical rule, I need to make a point to say this rule only makes sense for unimodal and symmetric graphs. So we're talking about graphs with one mode and relatively symmetric on the left and right-hand side. Now, again, because we are looking at a symmetric graph, I want to remind you that the mean is in the center. I want to remind you that the mean is in the center. Remind you that the mean is in the center, and that ultimately we said that we can add or subtract a standard deviation from the mean. We can add and subtract one standard deviation to and from the mean, and you know what? We can keep doing that. We can keep adding a standard deviation beyond the mean. So you can add two standard deviations or three standard deviations above the mean. You can also go in the other direction. You can subtract one standard deviation from the mean, subtract two standard deviations from the mean, subtract three standard deviations from the mean. And that is what you are doing then—chunking this unimodal and symmetric graph, chunking this normal graph into three distinct chunks. And here we go; we're in 3.2 right now. Alright, here we go, guys. How are we going to be chunking this? Alright, first chunk. If you go one standard deviation above the mean, one standard deviation below the mean, and you look at the range of data, if you look at all the data that is within one standard deviation of the mean, you are looking at 68% of the data. That, my friends, is the first conclusion of the empirical rule that says if you're looking at all the data that is within one standard deviation of the mean of the center, you're looking at 68% of your data. We can keep going. Let's now look at two standard deviations from the mean. If you're looking at all of this data within two standard deviations of the mean, you are looking at 95% of your data. Ninety-five percent of your data is within two standard deviations of the mean. Lastly, lastly, if you are within three standard deviations of the mean—pretty much, I am looking at the entire graph except these on the end—all right, anything that's beyond three standard deviations, just an itty bitty tail on the end, we are looking at a whopping 99.7% of our data. Ninety-nine point seven percent of our data is within three standard deviations of the mean. These three conclusions are the empirical rule. And if you're like Shannon, how are we going to remember what feels like these random percentages? Well, for starters, remember 68% is literally how we defined the idea of the majority back in 3.1. So that 68% should already be a number floating through your head. Alright, should already be a number floating through your head. But for the other ones, remember you are allowed a note sheet on the exam. One, this topic today is on exam one. But don't fret; you're allowed a note sheet. Now, after we ended up looking at this empirical rule, we now can use this empirical rule, these three main ideas, to answer some really interesting questions about our data set. So let's do that in example one. We're looking at the mean body weight for women between ages 18 and 25 years old and 134 pounds and a standard deviation of 26 pounds. Assume the data is symmetric and unimodal. Complete the following sentences. So, guys, before we even look at the sentences, let's just understand my circumstance. This data of weight is both symmetric and unimodal. So what's emphasizing is that I can use this beautiful normal curve on my paper. Now I've given two pieces of information: mean and standard deviation. The mean weight here is 134 pounds. Can you tell me where am I going to write that? 134. Am I going to write it on one of the left tick marks, one of the right tick marks, or the middle tick mark? It should go in the middle. Perfect! The mean should go right smack dab in the middle. Alright, the mean should go right in the middle. After that, we take a standard deviation, which said here is 26 pounds, and what we're going to do with that standard deviation is ADD 26 to move from right to right to right tick mark. So, 134 + 26 is 160, plus 26, 186, plus 26, 212. In the same way, we move to the left tick mark by subtracting a standard deviation of 26. 134 minus 26 is 108. Minus 26 is 82. Minus 26, 56. So, let's look at question a. About 68% of women ages 18 to 25 years old are between a blank and blank weight. We can see that by pounds. So what I am looking for, what I'm looking for are two weights. I'm looking for two numbers on this number line to fill in here. So that's where we go back and look at 68%. We go back and look at 68% and see that 68% is looking within one standard deviation. You go back to the empirical rule and say 68% is within one standard deviation. 68% is within plus or minus 1 standard deviation. And so we do that; we ask ourselves, "Alright, if I go up one standard deviation, if I go down one standard deviation, I see that the two weights I end up with are 108 and 160." And that is how we use the empirical rule to answer an interesting question. About 68% of women are between 108 and 160 pounds. But I want you guys to know you can also use the empirical rule in the other direction. You can also find a percent of data. What percent of 18 to 25-year-old women are between two different weights? Between two different weights. I want you to know you can also go in the other direction; it's still about counting the number of standard deviations. So let's look at 82. Let's look at 82 and note the fact that 82 is a total of one, two standard deviations from the center. Notice that 82 is two standard deviations below the center, in the same way that the second number, that second number of 186, is also two standard deviations now above the mean. It's now two standard deviations above the mean. And so let's think about that. We are now within plus or minus two standard deviations. And going back to the empirical rule, when you are within two standard deviations of the mean, which percentage are we looking at? 95%. Ninety-five percent of our data is within two standard deviations.