Transcript for:
Understanding Data Spread and Standard Deviation

Chapter two was all about graphing, and in particular, in chapter two, we learned this idea of spread. We learned this idea of spread to help us identify: What does my graph look like? And ultimately, we use these kinds of trigger words like high or low, spread out or clustered together. Yet, at the end of the day, it felt very wishy-washy. It felt very handwavy of what spread would mean. But now, in chapter three, we have learned this idea of standard deviation. And what we're going to see here is that spread and standard deviation are literally directly related to each other. Let's see what it means. All right, let's go back and remember that standard deviation, this idea of standard deviation, is saying that the majority, so that's 68% of the data, falls from the mean. Falls from the mean. So, what do I mean by that? We have here two graphs, two histograms of data. And again, the mean, we said, is the center, literally where is my data living and where does it look like the center of my data is? I can literally draw in black roughly where the mean is. In black, I can roughly draw where the mean is. So, the question then is, what is standard deviation? Standard deviation then is literally asking: How far do I need to draw out from that center? How far do I need to draw out from that center? And then, emphasize all of the data that is between that region in red. I am looking at 68% of my data. I'll say that one more time: Standard deviation is representing how far out I need to go from the mean so that if I look at all the data ranging in that region from the mean, we are looking at 68% of our data. So practically, I'm asking myself: How far out from the mean so that the shading in red is 68% of the overall color? What I am trying to emphasize here is that the distance from the center to the edge is the standard deviation. What I'm trying to emphasize here is that the distance from the center to the edge, that distance is the standard deviation. And what we're going to do now is really have the rubber meet the ro. We're now going to have our connection between chapter two and chapter three where, once again, we are going to bring up that idea that we learned in chapter two and now connected to the idea of standard deviation here in chapter 3. My first question for you is: The blue graph, comparing the blue graph and the orange graph, which graph is more spread out? Is the blue graph more spread out or is the orange graph spread out? Yeah, just visually we can see here the spread of the blue graph is way more spread out than the orange graph. Remember in chapter two we said that a spread-out graph had high spread. Therefore, what we're seeing then in the orange graph is that it has low spread. Remember, low spread is emphasizing our graph is more clustered. It's more clustered together. We can just see it visually: the blue graph is wider, the orange graph is more narrow. That is how we described spread in chapter two. But now what we're going to do is connect the two. We're now going to connect spread and standard deviation. We're now going to connect spread and standard deviation because now standard deviation is the physical distance we are moving from the center to the edge. I'll say that one more time: Standard deviation is now the physical distance we are moving from the center to the edge. And my question is: Which graph, blue or orange, has the larger standard deviation? Has the larger distance in green? Which graph, blue or orange? Yeah, it's once again the blue graph. Notice how the more spread-out graph had the larger standard deviation. And there, my friends, is now the connection between that ethereal idea of spread out in chapter two with now this concrete number of standard deviation. Just like what we calculated, we now have a concrete number to tell us: Standard deviation will be large when your graph is spread out. Standard deviation will be small when your data is clustered together. And these here then are the two relationships that you will always see between spread and standard deviation. The more spread out your data, the larger the standard deviation. The more clustered together your data, the smaller the standard deviation. I want you guys to look at these three graphs, A, B, and C, and identify which one has the smallest standard deviation and which one will have the largest. I'll give you guys a few moments to work on that. There are three graphs, only one can have the smallest standard deviation, only another can have the largest. I'll give you a few moments. Which graph has the most data clustered in the middle? It's the C1. C is going to be the graph with the smallest standard deviation. And again, let's talk about why. Notice how our data is the most clustered around the center. We have most of the data in the middle. Therefore, to traverse out of it to get 68% of our data, we only had to travel a pretty small distance. See again, clustered together. When your data is clustered together is when you will have the smallest standard deviation. When all of your data is the most clustered around the center. So then, guys, what is then going to be the largest standard deviation, A or B? Which one will be the largest standard deviation? Yeah, it's going to be graph A. Graph A will have the largest standard deviation. Again, why? It's because you're going to take the center and ask yourself how far, how far out do I need to go so that I'm looking at 68% of my data. And we can see here that because the majority of the data is on the edge, the edge of the graph, you've got to go pretty far to be able to have gathered 68% of your data from the center. Your data here is the most spread out. I wanted to just visually emphasize to you guys how we can identify the value of standard deviation from the graph, really connecting the skills we learned in chapter two creating those histograms and learning the idea spread with now standard deviation, the number that will explicitly tell us how either spread out (large standard deviation) or clustered together (small standard deviation) my data will be.