Transcript for:
Understanding Z-Scores and Normal Distribution

So we're going to use this scenario to work through and take a look at how we can take a value and convert it to a Z-score and then look up the probability of that value. So we've got this scenario here where we're given some information. The mean total cholesterol level for men is approximately 209 milligrams per deciliter of blood with a standard deviation of 35 milligrams per deciliter. So here we've got a mean and a standard deviation and we are going to assume that that total cholesterol level follows that normal distribution and we can see the two things that we need to know to know the shape of that distribution is the mean and the standard deviation. So this midpoint here will be the mean of 209. And we're assuming that moving over one unit, one standard deviation will be changing that value by 35 milligrams per deciliter. Okay. So we've got three scenarios here. What percentage? So we can use proportion and convert it to a percentage or probability and convert it to a percentage. So what percentage of men have a total cholesterol level less than 225? Then we're asking about proportion but we're still looking at a value of 225 but now we're looking at more than that and then what is the probability between two values. So just like we said in our steps of what we need to do the first thing is to convert the value we're looking at converted to a z-score and we do that by going x the value we're trying to convert minus the mean divided by the standard deviation. So the z score for a total cholesterol level of 225 milligrams per deciliter, we're going to calculate this. Now before you even calculate it, the very first thing that you should be asking yourself is, is this value above or below the mean? If it's above, we expect that z score to be positive. Okay, if it's below, then we expect the z-score to be negative. And if you get a deviation away from what that expectation is, then you've possibly made an error in that calculation, in specifying this numerator. Okay, so we would expect that it is positive because this value is higher than our mean. And then the other check that I sort of do very quickly in my head is try and estimate how many standard deviations above this is. Because one standard deviation above the mean would be 209 plus 35. Okay, but 225 is not quite there. Okay, it's 14 units above. So it's... it's definitely not going to be more than a value of one. And that's also why the second step, which we said is the optional step, but highly recommended, is to just sort of draw what we expect. So we, given that we've got the Z score of 0.45, okay, it's not quite one standard deviation above. In fact, it's probably... closer to about half a standard deviation above. Now before I even start looking at the the distribution tables what I'm wanting to do is maybe very roughly estimate what I expect my probability is going to be when I look it up on the table. Now I know that from zero to negative infinity okay so all of that gray area up to about here will be 0.5. all right because that mean in a normal distribution cuts out our distribution in half so total area under the curve is one half of that is going to be 0.5 so i definitely know that my probability when i look it up is going to be more than 0.5 okay because i've got a little bit more of the gray area from here to here so definitely going to be more than 0.5 i also know because of the empirical rule If you go back and take a look at that, from here to one standard deviation, we have 34%. All right. So from one standard deviation above, all the way down to negative infinity, that's then going to be the 0.5 plus another 34%. Okay. So I definitely know before I've even looked up anything on a distribution table I definitely know that my probability is going to be somewhere between 0.5 and 0.84 okay so now I'm going to actually head to my distribution table and look it up so we wanted sorry before we we do that just checking this you the what percentage of men have a total cholesterol level less than 225 that's definitely where we're looking at area to the left of that value okay less than these people will have lower values okay so 209 is the midpoint and it's going to decrease there and increase relative to 209 there so i went less than 225 so i'm looking at area to the left and so that's I'm going to look here and I have a positive z score of 0.46 let's round it to two decimal places so I'm using the positive z distribution table just going to zoom in a little bit 0.4 and then the second decimal place was six so then I go to the six here 0.46 and I look where those line up so I've got this is the row I'm going to be using 0.4 and then 6 means I'm going to use this column and the area to the left of that value then is this 0.6772. Okay so when I go here you can see the probability or the proportion of z being less than 0.46 is 0.6772. And the question did ask for percentage so I'm going to convert that to a percent by multiplying it by 100 and then the statement is just rounded off so instead of saying about 67.72% we're just saying about 68% of men from this population will have a total cholesterol level less than 225 milligrams per deciliter okay. Now the next question says proportion, so we're not going to have to do that step of converting to a percentage. The tables give us probability and proportion as is. But now we want total cholesterol equal to or more than 225. So it's the same, it's the same x value, so we don't have to recalculate that z. Okay, and our distribution, sorry, our diagram looks pretty much the same except now we're looking for area to the right of that value. So the position hasn't changed but now we're looking at area to the right. Now we don't really need to go and look up on the table again because the z-score hasn't changed. We're simply going to take our total area under the curve of one and subtract this area to the left that we looked up in the first question. to give us what was remaining okay and so then our proportion of men is 0.03 at 0.3228 okay now We've given the statement as a percentage. You didn't have to do that for this question because it actually asked for proportion. And this is a proportion. Okay. Proportion and probability are often used interchangeably. They go from zero to one. All right. So about 0.32 of men from the population will have a total cholesterol level. greater than or equal to 225 milligrams per deciliter. And then the last question, this is one where you're doing between two values and therefore between two z-scores. So we convert each of those to a z-score. So you've got the 174, that gives you a z-score of negative one. And we knew it was going to be negative because it's I'm having a mind blank because it's less than your mean and this one we know it's going to be positive because it's greater than our mean and we can see that they're actually the same distance away from the mean one's just below the mean and the other is above and so this negative one implies it's minus but it's minus one standard deviation so if we go 209 minus one of these 35 so it gives us the 74 and the same if we go 209 plus one standard deviation of 35 it gives us the 244 and so this is what we're looking for we're looking at the area under the curve between these two values all right and again i don't actually need to use my distribution table we will still look up those values to verify. But we've seen these two values before, standard deviation above and below the mean, one standard deviation. That's part of our empirical rule. Our empirical rule says that between one standard deviation above and below the mean, we have 68% of our data in the population. And so if we were to write this as a probability statement, we're going to take the big N area. the probability of z being less than negative one so that's from here all the way into the negative tail and if we look that up on our distribution tables that'll give us 0.8413 to four decimal places and then we subtract the smaller area which is from here negative one all the way into the tail which is 0.1587 And so then that then will give us by doing this subtraction it gives us the area of bigger area minus the smaller area gives us that area between the two and that's 0.6826 which is approximately 68 percent which corresponds to our empirical rule. Obviously with that empirical rule we're rounding a little bit when we say it's 68 percent it's closer to 68.26. So just make sure you verify these two values with your actual distribution tables. Now just a real-life example of taking a value and converting it to a z-score. This is from January 2010. It was when the world's tallest living man and the world's shortest living man, they met. And you can see them in the photo together. So, um, Sultan was, uh, he's 251 centimeters tall as that when I'm recording this, uh, I believe he is still living to this day and still the world's tallest man. Um, he's not, uh, sorry, world's tallest living man, but, uh, the world record for tallest man was 272 centimeters. Uh, and then we've got he, who is in, uh, was at the time 74 centimeters tall uh the shortest living living man i do believe unfortunately later that year he passed away but what we can do is uh compare ourselves to that so this is a picture of me evidently you can tell and if to scale if i were to stand next to him i'd probably be um around his hip height So he definitely wouldn't want to be standing right next to him. And then also to give you a bit of an idea, this is Toby, my chihuahua. He's a little bit of a porn star in this photo so I decided to make him a little bit more modest and cover up his bits. But he is the same height as he over here. So you can see that... they differ quite a lot from what the general population for height would be. What we can do is take these heights and convert it to a Z. score and see what the probability of having those heights in the normal population distribution for height would be. So if we take Sultan's Z score, so his Z score converted is a Z score of 9.5. It's positive because he is taller than the mean and then if we then convert that to find the probability of that I actually wasn't able to find a probability calculator that could go as far as a Z score as big as 9.5. Okay, they all just sort of stopped and went, oh, it's less than. And the smallest P value probability value I could get was 2 by 10 to the negative 4, which is extremely small. So in the normal population. someone as tall as 251 centimeters is extremely extremely unlikely The probability is very small now we can see with with he over here His Z score is even in magnitude. It's even bigger. Yes. It's more negative Okay, he's much shorter than the average male height, but the I couldn't like I couldn't get an accurate probability. The probability of a Z score being negative 12.6 is extremely small. Okay. Now, obviously, they don't form part of the normal population for height because there would be hormonal and genetic reasons for their extremes in height. But this is basically, you know, we take babies'heights at different ages and we check that they're developing at the rate that we are expecting. And we talk about babies being on different percentiles. Oh, my baby's on the 40th percentile for height. Or my baby's the 95th percentile for head circumference. So we position... our kids and check their development based on the the normal distribution and trying to see if they're in the ranges that we would expect and it's all based on the concept of a normal distribution so i'll just um i think i kind of alluded to this but i just wanted to make mention of it when we first count encountered the empirical rule we actually um talked about 95% being within two standard deviations. But it does actually correspond to more accurately a value of 1.96. And I know the difference between 2 and 1.96 might sound very trivial. I only bring it up because you might encounter some instances where it's referred to as 2 and other times as 1.96. 1.96 is more accurate and it is the way we're going to refer to it in the future but a lot of textbooks and websites just refer to it being 2 and when we look that up on a distribution table you'll see why the 1.96 is used when we go to four decimal places we see that exactly to four decimal places 0.9500 95 percent falls within 1.96 standard deviations above and below that mean okay so moving forward we'll assume it's 1.96 but you may also see the two used occasionally