Understanding Data Variation and Standard Deviation

So if you keep track we're on 3.3 now. What variation means for us when we're talking about data sets is really how the data is spread out. How is it varying piece by piece? Or as a whole, how is it varying? So I'll make it simple for us.

We'll say that variation really calculates how the data is spread apart. Let me give you an example about maybe why this is important. And you'll kind of see why.

Because a lot of people are like, well, doesn't the mean just tell us everything we need to know about a data set? Isn't the average the most important thing? And the answer is the average is pretty important. We use the mean for a lot of things. But it's not the only thing that we're concerned about.

And here's why. Let's say that, have you ever gone into a bank line and waited for a really long time? Have you ever thought there might be a better way to do bank lines than just everybody sit and wait? Well, this person tried this experiment. And here's what happened.

He took three different banks. This person was not me. I didn't do this. I'm not that concerned.

There's like four things I'm annoyed with, and one of them is waiting in lines. I hate, I hate waiting in lines. It is the worst thing ever in the world, lines. It's ridiculous. You have to wait in line to pay for something?

That's even worse. You wait in line to get punished. Anyway, I hate waiting in lines.

But this is not about me. So this is bank lines. And they timed the wait times of three banks. Here's bank number one, number two, and number three. And they had three different people go into these banks.

In bank number one... There was a guy standing in the front and what he did was he communicated to people okay you go to this line or you go to this teller, you go to this teller, you go to this teller and whoever's done first that's where he put the person. So he was very interactive with the people and made sure that no one was waiting too long and he moved them around. So he did everything himself and all three people waited six minutes each.

Okay, the next bank was like your typical bank. Okay, the typical bank is everyone stands in a really long line and when someone leaves you just go into that spot. You've seen that one, right?

So that's how most banks are. And the first person waited four minutes, the next people waited seven minutes each. Now the last bank, this is like the experiment part of it, they did what like the grocery store does. In a grocery store, you don't just stand in one line and wait for the grocer to open, right? You have to pick your line ahead of time, which we all know, for me, it turns out bad every single time.

I always do a long one. It's ridiculous. It's like two people and it takes 30 minutes. And like this line with eight people goes really quick.

That ever happen to you? Every time. Every time.

Every time. It doesn't matter. So anyway, that's what this bank did. It said you get to pick your line, but don't get this line.

So the first person, he only had to wait one minute. Lucky him. The second person had to wait three minutes, and the third person was me. Okay. Now here's what I want to do.

What I'd like you to do on your own right now, you don't have to say it out loud, but do this on your own, I want you to find the mean for each of these three data sets. So what's the mean of this sample and this one? Hopefully this one's pretty easy, and this one. You can use your new calculator trick, but there's only three numbers, so it's really hard anyway.

So do that on your own. Find the mean for each of those three data sets, please. Oh, yeah, you might need your calculator. I might, yeah. Sorry.

That was 10 minutes away from being sold on eBay. It's happening. I Got it yet? What's the mean for the first data set?

Six. Yeah, if that happens any time, you have the same number over and over again. Of course, that is the average. So we have, what's the mean for this one?

Six. Yeah, when you add them all up, you get 18. That's it. That's it. There we go.

We get six. How about for the last one? Six. So if all we had was the mean, do you see how these would look pretty much the same if that's all we could calculate?

They all have the same mean. Are they all the same data set? No, clearly not. I mean, if you just look at the mean, all the information is not given to you about these sets of data. This one was completely uniform.

No one waited a different time than six minutes. This one was fairly uniform. I mean, they're pretty close together. This one's way off. These people waited a vastly different amount of time, didn't they?

One person waited 14 minutes, one person waited one minute. So if all we had was the mean, we really couldn't convey that information. Fortunately for us, we can talk about variation, the spread of our data, in different ways.

The first way that we can do this. The first way we can do this is just with a simple range. Just the range of the data.

Highest minus lowest number. I mean that really does tell you some sort of spread, doesn't it? It really does.

Why would we use the range? Well firstly, it's by far the easiest to calculate. You just find the biggest number and you find the smallest number and you subtract it.

That's pretty darn easy to find. So range, that's max value minus min value. It's very easy to find. But can you think about this for a second and tell me one downfall of using the range? What do you think?

There could be. Sure, yeah, there could be something really high and really low that does not correlate at all to the middle terms. Do you guys understand that?

How many pieces of data are you considering when you're finding the range? Yeah, just two. How many pieces of data could you... have, hundreds, thousands. So to narrow it down to just two data values that are determining all your variation is probably not a good thing.

It only considers two of your data values at least, all the rest of them out. Now here we only have three. So here the range of Yeah, the range might be okay to use.

Here we have zero, that they're not changing at all. Here we have three. That gives you a pretty good idea about the variation in there.

Here we have 13. That gives you a really good idea that things are varying, that there's only one other digit that's not accounted for, one other number. But if I give you 100 data values, and like she said, what was your name again? Jeanette.

Well, Jeanette said, if one of them was way out there, I mean the range, you could have all your numbers between 1 and 10, right, and then you could have one number that's like 200, and that would make your variation seem way. way, way different than what it is. You with me on this?

So there is another way. The range, yeah, it's very easy to find as max minus min, but it doesn't take into account all the values. So that must mean there's a different way, and sure enough, there is.

The other way that we can measure variation in here is called the standard deviation. Can you say that with me? Standard deviation.

Yeah, well people, they lose those words, so we'll make sure it gets in there and make sure it sticks in your head because we are going to be using the standard deviation a lot. Okay, almost everything we do is going to be based around the standard deviation. And the mean. You're probably wondering what in the world is a standard deviation?

You just made us say this. Silly word, but I really don't know what it means. Let's talk about it.

For us, the standard deviation really is the most important and the most useful measure of the variation. So this is the one we're going to be using. This is number two?

Yeah. I'm sorry. Thanks.

The most important one and the most useful one for us. What it does, the standard part of this, how it says standard deviation, the standard really kind of means the average. It means what happens the most, what's the common deviation. So standard means average. Deviation means how far away it is, how far it deviates.

Deviate means a separation from, right, you know about deviations. A separation from the average term or, in our case, the mean. So what the standard.

deviation calculates is the average distance your data points are from the mean. Let's say it one more time. The standard deviation calculates the average distance your data values are from the mean. Measures the average distance your data values are from me. Before I get into the actual calculation and the formula that we're going to be using here, I want to give you some properties about the standard deviation.

First thing is it does compare the distances of the values from the mean. That's there. But second thing, the standard deviation is never going to be negative because we're talking about a distance. And what we're going to be doing is figuring out that distances... either left or right of the mean, we're gonna be able to calculate that so that they are positive.

I'll show you in the formula why that is the way it is. But we wanna calculate the average distance. If we took positive values and negative values and put them together in an average, some would cancel out. Do you see what I'm talking about?

They'd go away. The positives and the negatives would make it seem like there's not that deviation from the mean. And so what we need to know is that standard deviation is gonna turn out that it is never negative. Actually, that's not, well, it's never negative, that's true, but it's also never zero unless one thing happens.

The standard deviation is never zero unless, can you tell me which one of these might have a standard deviation of zero? Why the first one? Yeah, there's no deviation, right? The mean is 6, all the values are 6. If you found the distance from 6, it's 0. So it's never 0 unless all the data entries are the same. So never negative, and it's never 0 unless all the entries are the same.

The other thing I need you to know is that standard deviation is one of those things that's going to be greatly affected by, what were those things that were way outside the normal data? Outliers. It's greatly affected by outliers.

You're going to see why in just a little bit, but it really is. If you have an outlier in there, it's going to make your variation, your standard deviation seem a lot bigger. So, greatly affected by outliers. One more thing, the sample standard deviation that we're going to be using in here, it has a lettering. S.

So whenever you see that lower letter S, that means sample standard deviation. Just a little of this. Would you like to see the formula? No, it's very exciting. It's the thing you woke up today on Friday and said, I hope Mr. Leonard gives me the formula for standard deviation of a sample.

I really hope that happens today because I've been looking forward to it all week. Weren't you saying that this morning? I know you were. Okay, remember these numbers.

You have them down, right? We're going to use those in a second. Now, there's a little bit more math that goes into calculating this thing that I'm going to give you, but I am going to define it in a way that you can kind of see what's going on. Because understand, what's standard deviation again?

It's the... Right. So we need to somehow manage to calculate an average distance.

That's what this formula's going to do for us. And I hope you see that in the formula. So sample standard deviation.

Okay. I'm going to create over here because I'm going to need some room to work. How do you find the distance between two things?

How would you find the distance from the number 7 to the number 12? You take what minus what? Or if it's on a number line, you subtract one from the other, and then that gives you a distance in some direction, right?

So here, what we're going to be doing is we're going to take the distance from a data value, what letter represents a data value? That's it? X?

We'll take the data value and to find the distance from the mean, we'll subtract. We'll find the distance between these two numbers. Subtract. What are we going to subtract there? What's the...what now?

The mean. The mean. What's the symbol for the mean?

Why are we not using mu? Okay, so this is a sample. I hope you guys are with me on this. Are you okay on what I'm doing up here? X's are our data values, right?

X-bar is our sample mean. We're dealing with sample standard deviation. That's what we're talking about.

So if I take X minus X-bar, can you all understand this is going to give me the difference or the distance between the data value and the mean itself? You okay with that? Now, the problem with doing this is that I explained it earlier, but the problem with doing this is that some of these, because this is written a certain way, x minus mu, some of your data values are going to be to the left of the mean, like smaller. than the mean.

Some are going to be bigger than the mean. Are you with me on this? Some are going to be positive and some are going to be negative. Tell me something about an average. How do you find the mean of something?

What do you need to do with your pieces of information first? When you find an average, average, you all did the mean on your homework. What'd you do with those numbers? Added them. Added them.

So if I have numbers that are both positive and negative and I add them together, it's going to eliminate a lot of differences. In fact, it's going to come out to zero every time. So I would get a variation of zero every time. So to eliminate that fact, that I'm going to have positives and negatives, and if I add them together like an average is supposed to do, it's going to go to zero.

What I'm going to do to make these all positive is, what makes something positive besides absolute value because we can't do that? What makes something positive all the time, no matter what, this operation does make a positive? Square. Yeah, let's square it. We're going to square these.

We're going to square these numbers. So when we square numbers, that's going to kind of explode our outliers, that distance. So it's going to explode it. That's part of the reason why that happens.

You're going to see something else in a minute. So if this is an average, what we've done so far is found the distance. We've squared them to make them all positive.

There's a little bit more math that goes into that, but I'm just going to keep it so you kind of get the idea. Distance between numbers, we're squaring it. How do you find an average? What do you do with all those terms? Before you divide, you have to do this.

Before you divide them, you have to put them all together. You have to add them. How do you add something? What's the symbol for adding in this class?

Thank So we found the distance, we squared them to make a positive, we've added them all up, and then we're going to divide by the number of things that you just added. There's one little exception here. This is where it gets kind of weird for some people.

The number of things you just added, that was the lowercase letter n. Do you remember that from your sample days? Talk about that. So we divide by n. However, we're going to do it by n minus 1. What this does, this takes your sample and lowers it by 1. one number that pretty much overestimates the variation.

Think about this. If I divide something by a smaller number, the whole thing gets bigger, right? It's overestimating.

That's fine because we're not dealing with a population. Our sample is not exactly a population, so we need to overestimate the variation in order to use it. That's basically the idea here.

And then to undo the squaring. This is a little bit strange because of this. There's more math that goes into it that I'm not going to explain to you.

We do a square root of the whole thing. It would take a long time. time to do that. Do you see where the formula kind of comes from? Rather than you just putting on the board and saying here's your formula, use it.

You guys see what we're doing here? We're finding the distance from a data value to the mean. We're squaring to make it positive. We're adding it all up because this is after So we're all average of distances.

This is the average of distances divided. And then we're square rooting that kind of to get rid of that square, but also for a couple of other reasons. This is the underlying idea. Next time what we're gonna do, since we have like 40 seconds left, we're gonna do those bank line numbers. We'll find the standard deviation from that and look how that relates to our mean and our information.

So your homework from 3.2 that's online, you guys already got to that? Yeah. Well I think I gave it to you here also.

Okay, so find that, that'll be due on Monday. Okay, so you'll notice on the board we have the equation from standard deviation I gave you last time. Now, I do want to press remember on how we kind of made this thing up. Of course, our x's are our data values.

What's this thing mean? Mean. Means the mean, that's right. So this is the distance that every value is from the mean.

We're squaring it to make them all positive. We're adding them together and dividing to find the average, and the square root that has the mathematical value to it, but pretty much since we're squaring it, we're square rooting it. That's kind of the idea here.

So this is how we developed our standard deviation. Now, there is one more very useful formula that we can use to get standard deviation. It's mathematically equivalent, but it's used a little bit differently. I'll show you that right now. So you have an option in this class to use whichever formula you want to find standard deviation.

It's kind of nice. Sometimes this one works really well. A lot of times this one works even faster.

I'll show that to you right now. So another formula that you can use instead of this one. You still take a square root, but you do some different things inside that square root. For instance, do you remember, by the way, what n stands for?

That n. What about? The what now? The number of... The number of the standard?

The number of the standard? The number of the standard? The number of the standard? The number of items in your sample.

Yeah, so it has to do with sample. It's not the capital letter. It's the lowercase letter. Some people are zoning out already.

Come on, zone in. Let's focus here, people. We have the number of items in your sample. What's this symbol mean, everybody?

Okay, we add together, what are those? The value of your item, whatever you're considering. You add together these things squared. You subtract the sum of X, you add them all up, and then square it.

And then you divide by n times n minus 1. These two formulas magically will give you the same exact answer no matter which one you do. Now, the only difference between them, can you see what is here? That's not over here. Say that louder, someone said it.

The mean. Yeah. On this one, you have to calculate the mean. On this one, do you have to calculate the mean? No.

So that makes it a little bit nicer in some cases. If you have a lot of data values, we're going to see this, and we're going to do stuff with like three data values. but if you have like hundreds of data values, you don't want to have to find the mean and then subtract everything from the mean, right?

That's kind of annoying. In either case, you're going to be doing a lot of work, but if you don't want to find the mean, you can use this one. Is that clear for you? So either way, I don't really care what you use.

We're going to practice both of them today to show you how it's done. So without further ado, let's go ahead and let's find the standard deviation for some of our examples that we used last time. We talked about the bank examples. You remember the bank examples where the people stood in line and waited a certain amount of time? All right.

So what I'd like to do right now is I'll find the standard deviation. Remember the standard deviation was the average distance from the mean, pretty much how spread apart your data is. That's what we're looking for.

We're going to find the standard deviation for the wait time that had the 14 minute one. Remember the 14 minute wait time? It was like 1, 3, and 14. We're going to find that standard deviation.

Find the standard deviation. Thank you. I know it's STD, but don't worry about that, okay? Standard, that's what that means.

Standard deviation of the values 1, 3, 17. I mean 14. Here's the process for doing this. It's kind of nice to make a table, like a chart up for you. So we're going to have a table here. And using this table, we're going to be able to fill out this information.

Now I'm going to show this to you twice. We'll do this formula first, then we'll do this formula, okay? So you can kind of see the difference here in how it's calculated.

You know, I might need a little bit more room over here. So what we do, we make our table up. We're going to put our x's in the far left column.

In this case, can you tell me what my x's are? Great, okay. They don't have to be in any particular order. It doesn't really matter. So that's our values for our x's.

It just means the values in your data set. That's it. Now, the next thing we've got to do, what's kind of important, we need to calculate x bar for this example because... of this formula because we need to subtract every value minus that x bar, minus the mean.

So off to the side, we're gonna calculate x bar. Now I think I've already had you do that, haven't I? How much was x bar for these three numbers? So we list out our data values 1, 3, 14. We find our sample mean, that's 6. We're just going to go from the inside out of this equation, of this formula. It's very much like word of operation.

We just start here, work our way out piece by piece, and this table is going to allow us to do that. The first thing we need to do, what do you think the first thing we need to do is? Before we sum, look at the inside of the sum. What's that say?

It says, it says some, right? What's after that? Let's add to the sum.

Take the average and take it away from the x value. Okay, so take the mean, subtract the mean from x. Are you with me on that?

That's what this says. This is inside out, right? This is the most inside piece we have.

You have to do that first. So what we do here, before you start adding all this stuff up, that doesn't make sense. We have to find this piece first. This says you're going to add these pieces together.

So the first thing is you're going to make another column. I'm going to get a pen here. X minus X bar. That's what you're going to put in that column. So can you tell me the first thing that I should have here?

Notice how you're not doing 6 minus 1, right? Even though 6 is a bigger number, you're doing X minus X bar. It has to be that way.

So we're going to do 1 minus 6. How much are we going to get there? Very good. What's the next one we're going to do? Keep going.

How much? Okay, the next one, what are we going to do after that? And we get...

I'd like to show you something. Do you all agree that these are the actual distances from the mean? Do you agree on that? This is negative 5 units away from 6, or 5 spaces to the left from 6. This one's three spaces to the left from six.

This one's eight spaces to the right from six. Are you with me on that? Now, if we were to average these, average would mean add them up and divide by the total number, right?

But watch what happens. If you were to add these right now, what's negative five plus negative three plus eight? Yeah, that's because this is the average.

If you subtract the distances from the average, you're going to get zero. That doesn't make any sense right there, right? Because you know these are away from the mean at some spot, at some point, but if you try to average them, you're going to add them up and they're going to make zero, looking like they're not any far apart.

This works because you have an equal number to the left and an equal number to the right of the average. Maybe not an equal number, but the same, the equal amount of value, the equal number of spaces to the left and to the right. on that, but you're going to be okay with that. Right. That's why we're going to square them.

So the next column it says you take your values, you subtract your mean from every one of them, we have that now, and then you're going to square it. So on the next column we're going to draw x minus x bar and we're going to square it. All it says for you to do is take that distance that you just calculated and square it.

That's going to make everything positive because when you square a number no matter what it is, it becomes positive. So can you tell me what you're going to do? tell me what is...

Here. How'd you get that? Okay, and we're negative, but it doesn't really matter, because we're taking negative 5 times negative 5. True?

We get 25. Everybody, what's the next one? That wasn't everybody, but I'll take it. Yeah, it was 9. And everybody, what's the next one? So we take each of those distances, we square them. So we're going to recap just a little bit so far.

We list out our x's. We find our mean. For the first little part, we take every value minus the mean we listed here.

Then we're going to square them all. That makes them all positive. That means when we add them up, we're going to get a positive number.

that's great, none of them are going to cancel out and make zero on us, looking like there's no variation. We don't want that to happen. So we square them to zero positive now.

The next thing it says in our formula is, okay, after you square it, what's that mean again? Yeah, so we're going to, down here, we're going to add these up. So this is going to give us the sum of x minus x bar squared.

Question? Yeah, on your column header, you have x squared instead of x squared. Oh, yeah. You know what? Thank you for that.

I was just seeing if you guys were paying attention. Did you buy that? No. Oh, OK. No, that's not true.

No, sorry. I meant to write that x squared. Correct your notes if you had that wrong also.

Okay, so x minus x bar squared. Now we add that whole column, so go ahead and do that on your own. See what you get after you add those pieces.

What'd you get? We're almost done. We're trying to fill this formula out. And what we have so far, we have this whole numerator.

Do you guys see that that's accomplished already? We've done that part of it. This whole numerator happened to be 98. Now, we don't want to forget about the square root.

We have this large square root. But inside that square root, we're going to have 98. And on the denominator, what are you going to put on the denominator? Think about it.

What's the n mean? What's the n mean? Number items.

How many items do we have? So we're going to put down here. How are you getting 2? So I'm going to put 3 minus 1 so you know where it's coming from.

But our n in this case is 3. How do we find the n? You just count them. So we have 3 items.

So, s equals, that's the standard deviation for a sample, the square root of 98. 98 over 2. Okay, folks, what's 98 over 2? Square root of 49. What's the square root of 49? Nice.

Don't forget to take that square root. Now, this is kind of rare. Honestly, standard deviation is most oftentimes a decimal. It's rarely a whole number.

In this case, it happens to be a whole number. That's kind of nice for us. So what we got out of here is that the standard deviation is 7. If it's a decimal do you want us to leave it in with a radical?

No, no we're going to find a decimal too, one more decimal place than the data that they give you using the rounding rule. So here if we had whole numbers you're going to give me the standard deviation as like 2.4 or something like that. That's a good question. But no, we don't leave standard deviation as the square root.

I know that's kind of weird because in other math classes, you want exact values, right? You want like, oh, the square root of 2 is, we'll just leave it the square root of 2. You usually don't put the 1.41. Here, you're going to do that because we're going to use those numbers and calculations later on. That's, and we talk about that in other cases. You'll see that later.

We get to something called the z-score, and you'll have to have that down. Okay, how many of you feel okay calculating the standard deviation using the method we just learned? Would you like to see the other one? Here's how the other one works. Are you sure there's no more questions on this?

You see that what we're doing in every case? Subtracting the mean, squaring them, adding them, and then we divide, and finally at the very end we take the square root after you've divided. Now our next one.

Let's go ahead and do this here. We're going to use this formula. Now our table is going to be a little bit shorter because we don't have to do anything with the mean.

We need just three columns. Maybe just two columns. You certainly need your x's.

Our x's are again 1, 3, 14. Those don't change. The other thing you're going to need is your what? Can you see from the formula what you need?

So we're going to take each of our x values and we're just going to simply square them. So let's do this together. What's the first one? I could have done that one. What's the second one?

Ninety. Uh-huh. What's the next one? One.

Ninety-six. That's right. You have calculators, so no big deal. Hey, guess what?

You're ready to plug this in the formula now. That's kind of nice. Not a whole lot of work doing every item minus the mean. This is a little bit better for some people. They like this more.

So we're going to write out a large square root and in that we're just going to start filling out information. Now there are a couple of other things that we have to do here, mainly what it means to do this and what it means to do this. This one says What are you going to do first?

Are you going to add the x's and then square it or square the x's and then add it? Square the x's. Sure.

Okay, that's this column. All we have to do to find this piece is add these together. Are you with me on that?

Maybe I'll give it a little bit more room. So here, we're going to get the sum. Of the x squareds. Has anyone gone through and actually added those already?

That's fast. Good. This one. This one's backwards, right?

This one's backwards than this one. This says you're going to add them first and then you're going to square them. So we also need to add this column together. How much is that? So our numerator, we're looking for n, we're looking for the sum over x squared minus the sum over x squared, notice the pause, I'm emphasizing that a little bit differently, all over n times the quantity n minus 1. So there's a couple things.

We need to know the first one is n. How much is n? Participate with me. Come on. Verbalize it.

We should all be knowing this stuff. How much is n? Three. Good. All right.

So we're going to put three down. Can you tell me how much the sum of our x squareds were? Perfect. Then we have a minus sign, no problem.

The next thing is the sum of our x squared. What's the sum of our x's? 18. Just don't forget to square it. So you're going to put 18 here, great, but you're going to square it. You can't forget to do that.

That's really important. Otherwise, you're subtracting 18 versus whatever 18 squared is. It's a big difference.

On the denominator, not too bad. We're going to put, what's the first thing? Good. Times how much? Good, yeah.

So in this formula you're going to find you deal with slightly bigger numbers because you're squaring this, that's pretty big, you're multiplying here, that's pretty big, but it doesn't matter because you have calculators so whatever. How much is the 3 times 206? And 18 squared was?

All over, down here we're going to have 3 times 2, yeah that's going to give us 6 eventually. Oh shoot. 618 minus 324, what did you get out of that?

One more time? Can you tell me how much 294 divided by 6 is? Then take a square root, how much are you going to get here? Same thing. You know what?

It has to be the same thing. So we get the same standard deviation no matter which formula you use. How many people preferred the first one?

Okay, fine. Use that one if you'd like. How many people preferred the second one?

Sure. In this case, it really doesn't make a whole lot of difference, does it? You have to do...

about the same amount of work. But if you don't want to find the mean, you want to do it just directly from the numbers, you have an option for that, no problem. If this one seemed more reasonable to you because it actually uses the distance and then kind of averages them, fine, use that one.

really matter to me. You're going to have to show me at least one time how to do this by hand. So you are going to have to use the formula. Now, if you'll remember, remember the calculator stuff we did last time?

Do you remember the S on there? I said we'll get to that in just a second. Do you remember that?

That's this. So your calculator will do this if you just plug the numbers in and press one variable statistics. It'll do it for you.

So I need to make sure you can use the formula, but after I have you do that, you're welcome to use your calculator to calculate this. very very quickly. You with me on that?

Because it will give you everything. Isn't that nice? It's kind of like cheating, huh?

Like legal cheating, that's awesome. Somehow it makes it slightly less fun than actually cheating, but it's still pretty cool. You guys still with me today?

You guys are so mellow. Monday. Alright. You know I'd like to have you do one on your own though, just to make sure you can. Why don't you try to find the standard deviation for the 4-7-7.

Do it the way that you want to. So some people use, if you raise your hand on the first one, use the first one. If you raise your hand on the second one, use the second one and see if you get the same thing.

Okay? Okay. Thank you.

Standard deviation on 477. Your homework is coming around. Make sure that goes quickly because I have three assignments to pass back to you. So go through that kind of thing. So if you're using the first method, you've got to find the mean first, subtract each value minus the mean, square them, add them, then divide and take the square root. Using the second method, just find the x squared column first, add those two columns, and do what that formula tells you to do.

Thank you. We're about to get started. I think I'm going to do the second wave up here because I just feel more in a second wave kind of mood today.

So we're going to try that. So if I'm doing this the second way, what's the first thing I need to do? Okay.

Do I need to find the mean if I'm doing the second way? So squaring them, I know I'm going to get 16, 49, 49. The next thing that you do is you add both those columns because you're going to be using both those sums in your formula. So if I add these together, I think I'm still going to get 18. Just make sure you use the right terminology here. This is the sum of x, this is not the sum of x squared. You're just adding those x terms and you're getting 18. The next one, when we add the x squared.

That's that one. That's the first one you're going to use in your formula. So I'm going to add those together.

Have you done that? And sure enough, we're ready. We're ready to plug this into our second formula.

So our standard deviation. We'll have a square root, our n is still 3, so we'll start this out with a 3 on the numerator, times, it says the sum of our x squared. That in this case is 114 minus, then we're going to subtract the sum of x squared. So we'll take this column. And we'll square it.

After that, we divide by the 3 times 2. Hopefully you see where that's coming from. Just like in the last example, the 3's are n, the 2's coming because we're taking n minus 1. You notice a couple things about this I hope already. Do you see the difference, the only difference between these two examples?

What's the only difference between these two examples? Do you see in the formula? Look at this formula, look at this formula, what's the only difference? You can say it, it's okay, you can talk.

The what now? The 114. The 114 versus the 206. Do you notice how the 206 really jumped up because of that number? That number's an outlier. Do you see what's changing here is you're taking this and multiplying by the same thing just that 3. It's just those outliers make that grow drastically. This is the same in this case because when you add up those X's you get the same thing.

That's the same in this case. The only difference is this one has much more spread out data. That's what that's doing in this particular case.

That's why our standard deviation will be bigger here. Okay. How much is 3 times 114?

How much? 3.2. How much?

3.2. How much? So we do our 342 minus 324, how much is that?

Is that right? Mm-hmm. Cool.

What now? Square root of 3. We're going to get the square root of 3. Good for you. Now, we're going to find the square root of 3 on your calculator, so don't forget to take the square root of 3. Sanitization is not 3 here.

We're going to use that number for a different thing. Take the square root of 3 and you get 1.7 something.. If you can put 1.7 by the rounding rule, I like to be a little bit more precise than that. I like to put two decimal places. So I'm breaking the rules, but whatever.

I'm a rebel, you know. So we put 2, we're going to put 1.7. What's the next one?

Perfect. That looks a little bit better to me. Hey, is there a difference in the standard deviation? Yeah, even though we have the same mean, standard deviation is drastically different like we thought.

This one is going to be bigger. This has a much bigger spread. I just mentioned it was going to be a bigger standard deviation. This one's only 1.73. This one's 7. That's significantly bigger.

That means that this data is much more spread out than this data. Now, of course, we could see that just by looking at it, right? I mean, you can look at those numbers and go 477, those are all pretty close. 1314, those are pretty spread apart.

Even the range would tell us that. but it doesn't give us a numerical quantity to actually act upon. It doesn't give us something we can calculate and manipulate.

And this method does. So these numbers tell you not only that the data is more spread out in this case than in this case, but we'll be able to work with that later on also. How many people feel okay calculating this standard deviation?

Good, all right. Very good. Now it may come, you may come across an opportunity to find the standard deviation of a population. And if you do, there's a couple different symbols for that. Standard deviation for a population.

You're going to notice, if you haven't noticed already, that with samples we use lowercase English type letters. With populations we're going to be using lowercase Greek type letters. Remember the mu thing? It's a Greek letter. and we had X bar, it's like an English letter with a bar on top.

It's the same thing with the population standard deviation. Oh, we have S for sample standard deviation, S, sample standard deviation, no problem. For a population standard deviation, we're going to have a lower case Greek version of S called lower case sigma. That's the standard deviation. It's like you draw, I know it's weird, right?

It's like you draw a circle and then put a line on top like that. Or you can do like this. Can we start here?

Pew, pew, pew. That you need? Go ahead. I have a question on the standard deviation. Sure.

Where? The 3 times 2 on the plus 2. Sure. This is coming from, I'm doing this formula.

N is 3. 3 minus 1 gives us our 2. Does that make sense? That would be the mean. What now?

That would be the mean. No, it's not the mean. We're just taking n as the number of items we're adding that we're manipulating. OK, so how many items do we have up here? Yeah.

So this says you take the 3, and then you multiply it times the 3 minus 1. That's how we're getting the 2. That's a good question. Thanks for that one. Any other questions?

Are we OK with the canon? I mean, sigma? So we get sigma.

It looks really similar. Still going to have a big square root. No problem. You're still going to have a sum.

You're still going to have x's because those are your data values. However, you're going to subtract, can I subtract x bar in this case? What am I going to use instead of x bar?

Yeah. Yeah. I'm going to square it.

And lastly, here's a big difference actually. I hope you're paying attention. I know if you got this, you're like, yeah, this is fine. But just watch.

If you're dealing with a population, you are not going to divide by n minus 1. You're going to divide by capital N. That's the number of items in your population. Now, what this does.

What this does, this kind of purposely overestimates our standard deviation, saying we have a little bit more spread than normal because we're not dealing with a... We're dealing with a sample. We don't know, this doesn't represent perfectly our entire population.

That's the only reason why they do that. Here, we have our entire population, right? So this is like a legitimate average. You just divide by that number.

That's what you're doing. So we don't have a minus 1. if you're dealing with a population, which means, could you not get two different numbers depending on whether you're dealing with a sample or population for the same data sets? If I gave you these numbers, look at the board with me please real quick. If I gave you these numbers, 477, I said, okay, that's a sample. You use that, you get 1.73.

Are you with me? I said, okay, 477, that's a population. You use this, you're going to get something different than 1.73. It's going to be slightly lower than 1.73.

Are you with me on that? So it's not going to be 1.73, not the same thing. So you really have to understand and kind of grasp that we're going to be doing a different standard deviation depending on whether we have a population or whether we have a sample.

So you're going to have to read the problem, aren't you? And know what you're dealing with. It'll stay on there sometime.

It'll say. like in a sample of blank, blank, blank. Considering this to be a population, blank, blank, blank. It'll have that type of information. You just have to read it, know what you're doing.

Are you with me on this? Okay, you're also going to notice that's the only formula I'm going to give you for population because this one does not have a corollary. It doesn't go along with anything.

So this and that are your main standard deviations. Okay, that was for sample, this was for population. This is like a special case for sample.

Okay. You don't have that over here. So there's no like cheater case. There's no easy way to do that.

Three now. So the final answer for the second, would it be one plus seven is three? Would you accept three? Three is not the standard, no.

This is not the standard deviation for three. That's not the standard deviation. You have to take the square root. Because we're going to talk about what this number is inside in just a second. Okay, so you can't just leave it as three.

It's not three. You take the square root of 3, that's what you get. You okay with that? In our test, you're going to tell us whether it's a sample or a population test? Of course, yeah.

And then that will inform you. Exactly. So I'll tell you what it's going to be. You have to use the appropriate one, right?

If I say population, you better be using that. If I say sample, you have those two to choose from. It's a good question. Great question so far. Any other questions that you guys have?

Right now this is probably brand new to you, right? Yeah. probably never seen standard deviation before. What is this? If you're still having a couple questions, now is the time.

All right. Well, there is one more thing we need to talk about today, and that's a word called variance. It's another way to measure variation, but we don't actually use it as much as standard deviation. This is like our bread and butter. We use this all the time.

So you're going to have to be pretty good at finding that. Just like we had sample standard deviation and we had population standard deviation, we're also going to have sample variance and population variance. Here's the nice thing about variance. You ready for it?

Variance is based directly on your standard deviation, or if you want to think about it this way, standard deviation is inherently based on your variance. Here's why. You want to see it?

You ready? It's kind of cool. You guys don't want to learn another formula, do you? Do you?

Some of you are like, yeah, bring it on, whatever. But no, you probably don't. Here's the deal.

If you've calculated standard deviation, you've automatically calculated. you've already calculated the variance. Some of you are on our list out there.

If you've already calculated the standard deviation, you've automatically already calculated the variance. Here's the variance. Watch on the board. The variance is the number that you have before you take the square root. Okay, that's all it is.

So if you look at this, what's the variance for this example? What's the variance for this example here? It's 49. That's it. the square root, what's the variance in this example?

Do you see why I can't have you put standard deviation as three now? That's our variance, that's our standard deviation. The standard deviation is the square root of three, the variance is three. Are you with me on this?

So, it's very easy to find the variance if you have Standard deviation. You just don't take the square root, you've got the variance. That's pretty much it. Our symbols look like this.

For sample variance, because if you think about it, think about this. If that's our variance and that's our standard deviation, the variance is just the square of standard deviation. So our variance is S squared.

Our population variance is sigma squared. So for our example right here, our sample variance is, how much was that again? How much is our sample variance? The square root of 3 would give you 1.73, right? That's your standard deviation.

Your variance is the number without the square root. How much is our variance? 3. That's your variance in this case.

Our sample variance is just 3. How many of you are OK with that? Raise your hand if you're following that along. It's not still the square root of 3, because that again, that's going to give you your standard deviation back.

And that's how we're getting 1.73. Our variance is 3. Let's practice this one more time for this case over here. What's our variance here?

Is it 7 with a square root of 49 or 49 itself? What is it? 49. It's 49. The square root of 49 is 7. Also one more thing about math that you already know, I just want to make sure that you get this. So here we have 49, not the square root of 49, that gives you 7 again. When you take a square root, you should stop taking the square root after that, right?

So if you take the square root of 49, you don't end up with the square root of 7. Look at the board right now. Do you know that the square root of 49 is 7? It's not the square root of 7. You don't just keep on going and keep on going. Once you take a square root, you're done with that square root, and you stop writing it.

Are you with me on that? Okay, so we found the standard deviation a couple times. We found the variance a couple times. Do you feel okay about this? Would you like to see how to do it on your calculator again?

Okay. Of course you would. That's the fun part.

Do I get to put down the screen and turn this thing on? It makes sense. Star Trek-y sound?

See? Did you hear it? I don't know. I don't know.

I don't know. I don't know. I don't know.

I don't know. I don't know. So if you have a calculator, follow along and let's do this thing. The first thing that we need to do...

Go where? Where did we go again? Yeah, what class are we in?

Go to the... Oh no. You're back in the stand.

Don't die, Ryan. Three is not going to help. Can I borrow a calculator? Yeah, that's fine.

Thanks. So from your screen again you're going to go to stats because they were in stats. It brings you to the edit button again. That's your list.

So if you click on that, it's going to bring you to this first list. Now, if you've got some numbers in here, here's how to clear those numbers. Watch carefully.

You can't press one button or else you'll delete the whole list. You can go item by item and press clear, or go up and highlight the L1. Press clear and then press enter.

Don't press delete. That will delete your whole list. You don't want to do that.

I mean it will remove the L1 from your screen. You don't want to remove your L1 from your screen. So press clear not delete. Are you with me on that? and some on some of your calculators it will do that so now we're going to go through and enter the rest of our data so let's go ahead and we'll enter the four seven seven just to make it easy so like four down or enter seven Make sure it's in there.

You should say we have, right now it says four data items at the bottom because we have that highlighted, but we have nothing in there. So it's really only three. So because if you go back up there, it says we have three items. And that's your, well, it says the item that you're highlighting.

So we'll go back to our step. Go over to calc. That means your calculations that it can do for you. You're going to highlight one variable statistics again.

That's the only thing we've done so far. Highlight that. Do you remember how to find your L1 on your calculator?

You can do 2nd, find your L1, in this case it's the 1 button right there, it says L1 right on top. That will put it on the screen for you, you press enter, and it gives you all this nice information. Again, that sum of x, or sorry, the x bar, that is your what?

The sum of X, look at that. Isn't that something we just used in our calculation? So you could find it here if you really wanted to. Sum of X squared, we also used that one. Now the next two, that's what we're really looking at though.

The next two are your standard deviations. Notice the plural. Your calculator doesn't know whether you're dealing with a population or a sample.

You don't put in there anywhere, this is population data or this is sample data. You have to know that. But it will give you both calculations right away.

The only difference is whether you divide by n minus 1 or you divide by n. Notice how the population version is smaller. You see that?

Because this one's going to slightly overestimate it because it's a sample. That's just what it does. This was exactly what we had, wasn't it? Awesome.

So depending on what you need here, which one is our sample again, the top one or the bottom one? Is our S, that's sample standard deviation. This one's population standard deviation.

And we found the median down below also, below these things. Do you feel okay about using your calculator? Right, yeah.

Can you repeat that one again? Go review this back online or come and see me. I'll help you with that after class, okay?

Great question. Any other questions? Yes?

It doesn't tell you the variance. You just have to score that. Yeah, that's exactly right.

That's a great point. That was my next point. You're like one step ahead.

Just one, though. No, I'm just kidding. I'm Korean. Real life situation, we're going to use this in conjunction with hypothesis testing to test questions about data.

Whether or not we have enough information to go forward and state that we're putting too little soda in our cans, we need to up our soda. because based on a random sample we found out that all of our cans were too low or too high. And so what we're going to be doing is talking about standard deviation, using that in conjunction with our mean, something called a Z score, something called a normal distribution, and figuring out how to do it.

hypothesis testing. So we have a long way to go. This is just one little, little piece. It's a big piece, but it's a little piece.

Does that make sense? It's an important piece, that's what I'm saying. Are you with me so far, folks?

Okay. So we'll be using that standard deviation. Next time, what we're gonna be doing, we'll talk a little bit about some... So properties of standard deviation a little bit more. I'll give you what's called the empirical rule, which is kind of a nice rule to use, the rule of thumb.

It's not exact, but it's very, very close. The one thing I did want to mention was your question there. When you do have, when you have just used your calculator, don't back up yet, it's really noisy. When you do have just your calculator and you've just done your standard deviation, if I ask you for the variance, yeah, you're not going to be able to give me just 1.73, that's your standard deviation. In order to find your variance, you just square one, the number, depending on what you're doing, okay?

Because the variance is the population standard deviation or the sample standard deviation squared. So you take this. I don't you square it try not to round it all otherwise you'll be off just a little bit and it'll give you your variance Whether you should round it or not You can but it's going to be off.

You know, I mean if you round 1.73 you're not going to get the three. If you round all of this, you might not get the three. This number is supposed to go forever. It'll be very close.

So that's a good question also. Any other questions, Mr. Chairman? Did today make sense for you? Good.

Good. Now what I've done is I've put the 3.3 homework on the website. If you wanna go there and get it, you may.

I will write it on the board from now on. I just wanted to make sure you knew where the website was one time. So now that you know where it's at, I will be posting that homework ahead of time if you wanna get a start on it. So this is not going to be due on Wednesday, but you can get a jump on it.

We have a couple more things to do. Everyone clear? Yes. Have a great couple days. I will see you on Wednesday.

Okay, so last time. We were talking about standard deviation and really all we did was calculate that, calculate some variance and talk about the relationship between the two of them. Today, we're going to talk about some of the properties of standard deviation to start us off.

Like I said, we'll get into coefficient to answer your question Karina in a little bit. Coefficient of variation. So what we learned last time is a standard deviation shows how spread apart the data really is. So a large standard deviation would say that your data is relatively spread out. out.

Does that make sense? So the average distance from the mean is great. So the greater the distance from the mean, the greater the standard deviation saying my data is more spread out than close together. So it's all scattered. So a couple things.

Close data will have a small standard deviation. Spread apart data will have a large standard deviation. Thank you. Closely grouped data, it's going to have a small standard deviation, signifying that.

And then spread out data, data that's not closely grouped, it's like everywhere, all over the place, like the 1, 3, 14 that we had from the bank lines, that's going to have a larger standard deviation. Since we kind of understood the concept of standard deviation from last time, I hope that this makes sense to you. The standard deviation obviously calculated the average distance from the mean. If the distance from the average is way, way big, then you're going to have a way, way big standard deviation. from the mean is small, that means all the data is grouped around the center, right, where most things happen.

If it's all grouped around there, this standard deviation is really small. It's right in the center, right near the mean. If we have way different things, then we have a large standard deviation.

I'm not sure if you're okay with that. Okay. How much did you weigh today? All right, cool. So that's a couple properties about standard deviation that were kind of maybe obvious to you from last time.

There is one other thing that we can use the standard deviation to do, and that's using it in conjunction with what's called the empirical rule. Now, I do have to mention, have we talked about normal in here? Normal? A normal data distribution?

It has no skew to it. There's no outliers left or right. It climbs up to the average and then it goes back down.

It's a very nice bell-shaped curve. That's what we call normal. The data fits that basic shape. We call that a normal distribution.

empirical rule only works with the normal distribution. It won't work with a skewed data set. So if our data set is normally distributed, we can use what's called the empirical rule. It's kind of a rule of thumb.

It's an approximation, but. but it's based on some math that I have to teach you later on in chapter six. So we're kind of jumping ahead just a little bit as far as the rule of thumb goes, but I will go back when we get to chapter six and explain how all these things were achieved. You okay with that?

Okay, so for right now, if a data set is normally distributed, we can use what's called the empirical rule. Thank you. Okay, what's the new rules say?

It's a rule that says how much proportion or percentage of a data set will fall within certain standard deviations of the mean. Some people call it the 68, 95, 99.7 rule because of this. So here's what the empirical rule says.

It says if your data is normally distributed, this happens all the time. This is really quite interesting that this does happen no matter that the data set, if it's normally distributed, this is going to work. It's kind of crazy. But it says these three things. Approximately, if you have a normal distribution of data, 68% of the data 68% of the data will fall within one standard deviation of the mean.

Now that might not make, well that's such a, wait a second, that's really, what's that even mean? Here's what it means, okay? You all know how to calculate the mean.

You can calculate the standard deviation now, right? Right? Right, okay, good, I hope so. And so if I took the standard deviation of the heights, like heights of people, it's gonna be, I don't know, maybe three inches or something like that.

You'd be able to calculate that standard deviation. Are you with me on that? You'd also be able to calculate an average for any group. True?

If you take a random sample of people, the heights are generally normally distributed. They have an average. There's people above that average and people below that average. No one's in your sample, well, rarely are you getting someone in your sample that's like seven feet tall. Rarely, right?

Unless you're taking a sample like NBA players, but that doesn't really happen here, right? We don't have NBA players in our class. So if we took a random sample, our heights are going to be normally distributed. We're going to have a sample Sample mean will also have a standard deviation. Okie dokie with that?

If our sample distribution is normally distributed, if our sample data is normally distributed, what it says is if you have the mean and I take one standard deviation, which is three inches in our case, and I'm making up off the top of my head right now, but let's say it is, I take three inches and add it and subtract it from the mean. Are you with me on this? So I'm getting a standard deviation above and a standard deviation below.

That's it. average, 68% of you are going to fall in that range. That's what it says.

It says 60% of your sample will fall within that range. You understand the idea here? That's what the empirical rule says. It says, number one, 60% of the data will fall within one standard deviation of the mean.

Well the next question is, okay, if 68% falls in one standard deviation, what percentage falls within two standard deviations? That's the next rule. I said there's 68. The next one, do you remember me saying it?

It's in the 90s. It's like 95. 95%. So within two standard deviations, you're going to get 95% of your data.

That's a lot of the people. So 68 for the first. 95% of your data will fall within two standard deviations.

Lastly, if we go back to three standard deviations from the mean, that's, remember, fall within standard deviations means you're going from the mean to the right and to the left. That's the segment we're talking about. So two standard deviations would mean to the right twice or two times and to the left twice or two times that.

Number two. Three standard deviations means one, two. So three times up there and three times up that way.

So between these two numbers, you would have six standard deviations. That's three within the mean, all right? If you did that, you're going to get 99.7% of the data. Is that everybody?

Is that everybody? Is that a lot? Mathematically. Let's say you were talking about huge samples or a huge population mathematically.

Is it ever possible to cover everybody if you keep going up? Here it's 68%, then we have 95%, then 99.7%. Is it ever going to be 100%?

Practically, maybe. Because in this classroom, there's no one over a certain height, right? But theory-wise, no. It's a never-ending bell-shaped curve. So you keep going up and going out.

It's just the rareness of finding that person increases. Thank you. Here's what this says to us. Is it likely that we're going to find a piece of data that's outside three standard deviations from the mean?

Is it likely? If 99.7% of the data, 99.7, that means out of 100 people, out of a thousand people, only three of them are in there. So, if 99.7% of the people are within a certain range, is it likely by randomly picking someone, they're going to be outside of that range? Is that going to happen often? Or are most of them going to be within the range?

Clearly, yeah. 99.7% of the people are going to be within there. That's a vast, vast majority. In fact, within two standard deviations, you get how, what percentage?

That's... It's pretty likely that you're not going to find someone outside that range. Fairly likely, right? Not too common. 95% out of every 100 people, 95 are going to be within there.

Pretty likely that if you draw names out of a hat, you're going to find someone in that range. Isn't that true? Yeah, okay.

That means this gives us a rule of thumb, what's usual and what's not usual. If we have something that's within two standard deviations of the mean, 95%, we're going to call that piece of data a usual. piece of data.

If it's outside of two standard deviations, for right now we call that unusual. Does that make sense to you? So pieces of data falling within two standard deviations of our mean are considered normal.

Yeah, that happens. That's usual. Outside of that, we're going to consider it unusual. So when you're asked on your homework, is this piece of data normal or usual?

I'm sorry, usual? Is this piece of data usual or unusual? You're going to go, well, let me think. Is it more than two standard deviations away from the mean?

If it is, then it's unusual. If it's standard deviations, then it would be considered usual. We'll do an example in just a minute to kind of illustrate this also.

I won't just do all the theory. So, if a piece of data is within two standard deviations of the mean, we call that usual. Or a better way to say it is a data value, not a piece of data.

I'm going to go delete it now. Piece of data, you know, you start talking about Star Trek again, data's the Android guy, piece of data is kind of strange, so, data value. So if a data value is within two standard deviations from the mean, that's considered a usual value, correct.

Outside of the two standardizations, it would be unusual. Again, can we ever cover 100% of our data if we just keep going out standard deviations? Practically, for a certain data set, yes, we can.

But in theory, no, you really can't. Because if you consider the entire population, there might always be the chance that you're going to get something higher. Always might be the chance, or lower, than what you could imagine. Have you ever seen those pictures of those guys?

I mean, we're talking about height. With height, we all understand that, so I'm using that example. But you generally don't get people who are 12 feet tall, right?

Have you ever seen someone who's 12 feet tall? You ever heard of anybody who's 12 feet tall? You didn't. I've heard of 8 feet, though. Haven't you heard of 8 feet tall?

Under the giant's like eight feet tall or something crazy like that. I've heard of two feet tall. There's like this two foot tall fully grown person.

Fully grown, two feet tall. So, I mean, you get those extremes. But height isn't a real good example to say you can't ever get there.

But something like... like measurements are, like measurements of number of cells in a cancer, what do you call those things? Someone in here has got to be in biology, little, I forgot the name, like a tumor. There you go. Could you have a number of cells in a cancer tumor?

Could it, does it ever have to have a highest amount? As tumors keep growing and growing and growing, right? So in that case, I mean, there's really no upper limit. Keep going and going and going.

So sometimes you can't ever get to the end of our data set. You'd have this outlier where that cancer tumor has a zillion cells or something crazy like that. It's possible theory-wise we can't ever get to the end. Practically, yeah, with height and maybe with weight we can actually go enough standard deviations to cover all of our data, but theory-wise we can.

So within two standard deviations, as usual, if you've got something that's lying outside of three standard deviations from the mean, that is really, really unusual because only three items out of every thousand items is going to fall outside of that. So randomly picking something and saying, oh, what's the chances that's going to be outside three standard deviations? That's very rare. We're going to ultimately use this information, along with something called a hypothesis test, to make decisions about the data that we're going to collect. Okay, that's really where we're at.

of where we're going with this. Right now I just need to understand what a normal shape is. Do you understand what a normal shape is?

And if you have something normal, that we have things that are considered usual, within one is usual, within two is usual, outside of two is unusual, and things that are very unusual, outside of three is extremely rare, very, very rare. So I'm gonna write that piece of information down too. A data value outside three standard variations of the mean is extremely rare.

Let me give you kind of a graphic picture of what this looks like, by the way, just so you have an idea. Can you show me with your hands what a normal distribution would look like? Show me. Does it look like this?

Show me. Good, okay. Here, show me.

Uh-huh. Like a... A little hill.

A little hill. Yeah, kind of like a little hill. Some of you were, and you all were doing this, so...

It's very cute. Kind of like that. Looks like one of the better ones I've drawn. Hey, all right. Now, on your normal distribution, what's going to happen is where your mean is, is where most of your data is grouped.

And in fact, in a normal distribution, it is right smack dab in the center. That's where your average, your arithmetic mean will be. So right here.

would be your X bar or your mu if you're talking about population. Okay, I'm just using X bar since we've been talking about samples lately. And you'd be able to go ahead and calculate your sample standard deviation.

And here's what this is saying. If I go over one standard deviation. to the right and one standard deviation to the left.

By the way, how can I get to this point if that's one standard deviation? I mean mathematically. How could I get there?

How could I say this is one standard deviation to the right? Let's do an example if you're unsure about that. I'll do it over here.

We'll fill both in at the same time. I see a lot of blank looks right now. Okay, let me make up some information.

This is not accurate, I'm just making it up to illustrate, okay? So here's our sample. What we found that the heights, we measured the heights of people, we were working with heights. We measured the heights of a certain group of people and it came out that it was normally distributed.

Is it important for me to tell you that heights are normally distributed for us to use this information? If it's not normally distributed, can you do this stuff? Not that.

Not empirical rules. It's got to be normal. Okay, heights are normally distributed. I have to give you some other information.

Let's say we had a mean of 65 inches. About how tall is that? 65 inches.

65? Hopefully not 65. 65 inches. Yes. 65? 65. 65. 65. With a mean...

65 inches, that would be 5'5". Right? 60... Yeah. 5'5". So we have a mean of 65 inches and a standard deviation, we keep it easy, of 3 inches. That's not bad. By the way, just so we kind of spiral into information here, if we have a standard deviation of 3 inches, can you please tell me what my variance is? How'd you get that?

Square. Ah. So variance and standard deviation, one's just the square of the other. True. So if you have standard deviation is 3, your variance is automatically 9. It's not a hard thing, just square.

That's what we're doing here. Okay, so standard deviation is 3 inches, the mean is 65 inches. If I say normally distributed, we're going to get a picture just like this.

Geez, I'm getting good at those. Look at that. That's pretty good. It's like out of a book.

Right? It's like I've done this once or twice before. I have.

It's my third time. What goes right in the middle? What number is going to go there? Not a zero because that's not... 65. 65 because that's, what is that again?

M. And what letters are we going to use to represent our mean in this case? X.

Why are we not using mu for right now? Sample. Very good.

I said sample. I didn't say population. I said sample. So, for some sample, we got X bar is 65 inches.

Awesome. Now, what I need you to do is be able to find how much one standard deviation within the mean is, how much two standard deviations within the mean is. And 3. So we have a mean of 65 inches and we have a standard deviation of what was our standard deviation?

What letter do you use to represent that? S. S or lower case sigma, which one?

Y. Y, S. Again, sample. X bar and S, they go together.

Mu and that, the canon thing, like that, the S, sigma, goes together also. So my question is, okay, you have this information. How am I going to find something that is one? standard deviation away from the mean. How much is standard deviation again?

What was it? Yeah. And what was the mean? What's one standard deviation away? 66 would be one unit.

How many standard deviations, how many units is our standard deviation this? 68. Aha. So our mean is 65. Our standard deviation is 3 inches, right? 68. If we want to go one standard deviation, it's 68. How do you get from here to here?

68 inches. Sure, yeah, what did you do to those things? So if this is our X bar, to get from here to here, to get from X bar to one standard deviation away, all you had to do was add S, or in our case, it was a 3. Does that make sense to you?

Now this gives us one standard deviation to the right. Can you tell me what's one standard deviation to the left? 62. Good, 62. Are you all sticking, are you all understanding how I'm getting these numbers? I'm not Harry Potter today, right?

This is not my wand. This is just going with standard deviation. So if our standard deviation is 3 inches, we go to the right 3 inches, we go to the left 3 inches, minus S. That gives us within one standard deviation.

Raise your hand if you understand that. So the range of 62 to 68 inches is within one standard deviation of the mean. Now you tell me, we've just done this.

Are you okay doing this yourself? We've just done this. percentage of the people in my sample are going to fall within 62 and 68 inches? 68 what? Percent.

So between here and here, and here's how you can represent this. We haven't really got this far as far as the area goes. Here's how you can represent that portion of people on your chart.

If you draw this and treat it like an area, those would be straight. I'm not good at straight lines. 68% of the people are going to fall right there. So far so good? Now on your own, what I want you to do is calculate the next standard deviation of weight, okay?

Both to the right and to the left. Did I already pass 4? Awesome. Ok, what's the next upper limit for our standard deviation here? What are we doing?

They're going to be the same distance because you added 3 again, didn't you? I hope you did. They'll be right there.

So you either just keep adding 3 or if you really want to do this, from here to here, Couldn't you have just added 2 times S? 2 times S is 6, right? Could you just add 6 to this number and get there? Yeah, it's the same thing as adding 3 twice. So you could do that.

You'd add 3, 2 or 3 times, whatever. Or just multiply it once and add that together. Or the lower limit's going to give us how much? Okay. And what we know about this is that this is within two standard deviations.

So the range from 59 to 71, that gives us two standard deviations from the mean. That's to the left, that's to the right. What percentage? Ninety-five percent.

The purple is going to be ninety-five percent. Can you see the difference between the black and the purple on that one? So 68%, that's just this middle range. 95% that covers this range. If I incorporate another standard deviation, we can do this one together, we're going to go out another unit.

What am I going to get at the top end? How much? And someone else at the low end, what am I going to get? This would give me three standard deviations away. You notice that we could just multiply 3 times S and add it.

We could subtract 3 times S, and that's going to give us three standard deviations away. Now, what percentage is within three standard deviations? I don't want to muck this one up anymore. I said muck, muck, muck it, muck it up. I don't want to mess that up anymore.

99.7%. Oop. I already did, didn't I? So within 3, 99.7.

That begs the question, how much is within everything if you go on to infinity, to negative infinity, to infinity? Yeah, but what percentage of the people would be covered in that? That would be everybody.

And what we're going to do in the future is relate 100% to the number one, and we'll talk about proportions, be able to use that information very accurately. Okay, I did overhear what I wanted to do. Which is to show you that if we go out one standard deviation, we're 68%.

If we go out two, we're 95%. If we go out three, we're 99.7. So we don't need to fill this one out again. We just did that.

Would you nod your head if you're okay on that example? I feel pretty good about that. I'm going to have you do one on your own just to make sure that you can get it. Mean is 34 pounds.

34 pounds. Remember LBS at the very end stands for pounds. Mean is 34 pounds.

Standard deviation is 8 pounds. Let's see. Find out what percentage of the people, or what percentage of my data, not people, because I don't think anybody's going to be 34 pounds.

I don't think so. No. Maybe babies. Tom, you're 34 pounds?

Congratulations, man. What's your diet? Rice cakes.

They weigh less than air, so... You should drink helium. That's my breakfast today. What percentage of data will fall between 10 pounds and 58 pounds?

Figure that out. Do it on your own. If you know it, don't say it out loud.

Just figure it out on your own. I want you to understand that we don't have to always go this way. We don't have to figure out the standard deviations.

We can take, or to figure out the... of our data, we can take the limit of our data, use that in conjunction with our standard deviation and mean to figure out what percent of our data falls in that range. So figure that out. Think about it for a few seconds.

I want you to struggle with it in your head first. Okay? I want you to kind of grasp it.

I don't want to just tell it to you. So think about how you might do that. There's going to be several ideas. There's several ways to actually go about doing it.

I'll give you what I think is the best one, but there's several ways to do it. Has anybody by a show of hands, has anybody got this yet? Know what percentage is within there? Okay if you're still struggling let me give you some hints here.

Let me give you some hints. Firstly do you know your standard deviation? Can you find out? How many standard deviations fit between this number and that number?

Don't say no, try that. Try that. That's one option.

See how many standard deviations fit in there. What I mean by that is you start with 10, right? That's your lower limit here. You start with 10. Start adding 8 to it and see how many 8's go into that. That's one way to do it.

It's kind of a... trying to nail a hammer together, trying to nail a house together with a rock. It's a little bit too complicated, but, I mean, you can do it. Complicated?

A little too hard? I can see that. Did you get a number by doing that?

You probably got, anybody else get six? Okay, what that tells you is from here to here, you're six standard deviations away, right? Six standard deviations away.

Isn't this, one, two, three, four, isn't that six standard deviations away from each other? Now, of course, this has to be centered on the mean, so if you do it that way, you're missing a vital piece of information. You haven't used the mean at all. What if this six, or you said six, right?

What if that difference of six standardizations was not even close to the mean? Well, then this really wouldn't work. But if it's centered around the mean, then that would work just fine, okay? So we have to check that. Now, here's how you do it.

You want to calculate how far away from the mean this one is. and how far away from the mean this one is. So how you do that, subtract it. Why don't you do this for me right now? What's your mean?

Take your 58 minus 34, okay? Take your 58 minus 34. Here's your 34. Here's your 58. What's the distance between those two numbers? Twenty-four. Okay. Now, I want you to take your 34 minus your 10, because 34 is here, 10 is obviously to the left.

I want you to find the distance there. What's the difference between 34 and 10? Between 4 and 10. So would you say that these are equidistant from the mean? So this is centered around the mean, right?

And this is a normal, and we're assuming this is a normal distribution. So this falls into this empirical rule category. It's centered around the mean.

It's normally distributed. Therefore, we can use empirical rule now instead of adding 8 six times. There's just maybe a different way to do that.

Think about this. If this distance is 24 and your standard deviation is 8, can you figure out how many times the standard deviation goes into 24 without doing the hard way of, let's see, 8 plus 8 plus 8 is 20. Can you do it a different way? Say that? Divide it a little bit.

We can divide it, right? Could you divide it? We find the distance between your mean and the number that I want to calculate here, the number I want to find the percentage that falls between these.

Take that distance, maybe divide this by your standard deviation, and that will tell you the number of standard deviations. that it is away from the mean. Now here's the cool thing about this. Do you necessarily have to get a whole number all the time? Here we will, because we're using the empirical rule.

But in the future, what if I made that 59? You're going to get 25, right? Please say yes.

If I divide that by 8, am I going to get a whole number? No, I'm not. But it would still give me, this is interesting, it would still give me the number of standard deviations away from the mean. It just won't be 1, 2, or 3. It would be like 3.2, 3.1, something like that.

We're also going to be able to use that information. You don't have to be a certain number of standard deviations away from the mean. That would be impossible because that would say, oh, everybody in the world is only 62 or 68 inches or 59 or 71 inches or 56 or 74 inches. Are you one of those six measurements? Are you?

Maybe some of you are. I'm 72, so I don't really, I don't fall in one of these four. So I am not exactly two standard deviations away.

Are you getting the picture here? Not every piece of data is either one or two or three. You could be, I am 2 point, like 2.3 away from the mean. I'm not exactly that measurement. You with me on that?

So you don't necessarily have to get a precise whole number, number of standard deviations away from the mean. Here we will because we're using this. in the empirical rule, but in the future we're not going to. The idea is this, though. You find the distance between your numbers, between your data value and your mean, find the distance there.

You divide it. You divide it by the standard deviation. Notice that x minus x bar, I'm using symbols now.

Do you see that that is x minus x bar? X minus x bar gave you 24. Nudge your head if you're okay with that. Okay, I'm kind of previewing the next section for you. You're going to see this on Friday. I'm previewing it.

It's kind of nice. This gives you the distance from a data value to the mean, agreed or not. If we divide by the standard deviation itself, which would be our s. X minus X bar divided by S. That will give you the number of standard deviations away from the mean a data value is.

In our case, 24 divided by 8 gives us 3. We do the same thing over here. We do the same thing over here. We're not going to quite worry about the sign right now because I know if I take x minus x bar I'll get a negative. Now that's going to help us out in the next section, but right now I just want you to get the concept down.

The distance is still 24. We're still dividing by 8. you're also getting three standard deviations. Now with this information, with knowing that it's normally distributed, with knowing it's centered around the mean, and knowing that you are three standard deviations to the left and three standard deviations to the right of the mean, can you tell me what percentage of data falls between 10 and 58? 58 pounds please.

Now that was a huge build up, and it's like two people, come on. You should know, how many percent falls within that range? Yeah, that's right. That's three standard deviations to the left and to the right. So whatever population or sample I'm working with, in this case a sample, whatever sample I'm working with, we had an average of 34 pounds, standard deviation was 8 pounds.

So what I can tell you is that if I go from 10 pounds to 58 pounds, that's going to cover almost darn everything in there. 99.7% of our data values. Richard, have you understood what we talked about so far? Good.

Now the last thing is a very quick thing. Unfortunately, standard deviations themselves can't be compared. I'll give you an example about this.

Here's our heights and here's our weights. So I say that for height, we had a mean and standard deviation in each case. Our height had a mean of 65 inches. Our standard deviation was 3 inches.

For our weight, let's say we were calculating the heights of people and the weights of those same people. Let's say the average was 175 pounds, and the standardization was, let's say, 15 pounds. Let me be more drastic than that.

The standardization was 7 pounds. So you get the idea more, okay? Here's the point.

Which one has a numerically bigger standard deviation? Eight. Is that better?

Eight. The weight does, sure. That's seven, the other one is three. So numerically, that has a bigger standard deviation. You agree?

Okay. Which has more variation? Even though this standard deviation is bigger, can you say this has more variation?

Yes. What do you think? I'll ask the question again. So this has a numerically bigger standard deviation, right?

This one's 3, this one's 4. Just by looking at the numbers, can you say this one has more spread in the data than this one? And the answer is absolutely not. No.

So you know no why not you're dealing different units people you're not even talking about the same thing here You're talking about three inches compared to a mean that has 65 right? You're talking about four pounds to a weight of me that has 175 that information There's information there that's different. It's not comparable.

You can't compare those two things. Just because something has a larger number numerically as far as standard deviation, you're talking about inches here and pounds here. Those things don't directly correlate.

We have to have some way to actually represent what has more spread. We have two ways to do it. One's called the coefficient of variation, which translates a standard deviation in comparison to its mean to a percentage.

It says, okay, this has this percent of the coefficient. is this percent, therefore this is varying greater or less than another one. You can't compare them directly. It's impossible. You can't do it.

We have another way called a z-score, which I tell you next time. Actually, we've kind of already done it. This is the underlying concept for that.

I'll just flesh it out on Friday. But for right now, coefficient of variation does this one thing. It won't take me a minute.

A minute and 20 seconds. What it does is it compares the standard deviation of a sample to the sample's mean, and it multiplies by 100 to get a percentage. So here's what we're looking at. It's a very easy calculation.

You take the standard deviation, you divide by the mean, you multiply it by 100 percent. Okay? It's a decimal. Multiply by 100, all it does is move the decimal place twice.

Okay? So in example one. I know you all should have some calculators out.

Can you take standard deviation, divide it by 65, multiply it by 100%, and tell me what that is? Come on, I only have 30 seconds. How much?

This is 4.6? And this one, please. 4 divided by 175. 2.3, very good.

Those are both percents. Now, it doesn't say that this data varies 4.6%. It's just a way to compare them.

So look at the numbers. Which one actually varies more, the top one or the bottom one? Which one varies more? Top one.

Top one. Yeah. It has a larger coefficient of variation. So even though the number of the standard deviation was larger, hey, this one's actually varying more because you have to compare it back to the mean. How many people want to show what we talked about today?

Good. We're going to stop there. We are done.

Transcript for:Understanding Data Variation and Standard Deviation

Transcript for:
Understanding Data Variation and Standard Deviation