Transcript for:
Empirical Rule and Normal Distribution

hi everyone this is Matt to show and intro stats and today we're looking at the empirical rule the empirical rule which is another famous rule about normal quantitative data it's also sometimes referred to as the 6895 99.7 rule which we'll talk about in a little bit so just to kind of remind you again when we went over normal quantitative data we said that the center or the average should be the mean and the spread or the variability is measured by the standard deviation that's the most accurate measure of spread for normal data so a question that that kind of came out a long time ago was how do we find percentages for quantitative data for quantitative variables so if you kind of think about it categorical data it's pretty simple right we kind of look at how many people have a certain characteristic out of the total right that's kind of how we calculate calculated percentages for categorical data but what happens if we're dealing with quantitative data quantitative data has unique properties and it makes it problematic sometimes for figuring out percentages now if you're just dealing with sample data you might say oh this many people in my sample data had height you know of 69 inches or taller out of the total so I could figure out at least a sample percentage that way but if you're dealing with like populations and you're dealing with sort of theoretical theoretically what would that percentage be it makes it very difficult mainly because quantitative data yes is infinite but it's what it's infinite and continuous so you know there's infinitely many numbers just between 1 and 3 so there's infinitely many possibilities between one kilogram and three kilograms so it makes it very difficult to use things like a mountain of total all right if like if I said well what's the amount the amount of people that have something over over maybe a weight over two kilograms well that would be infinite and the number of total number of numbers between one and three kilograms would be infinite so becomes very problematic sometimes to deal with amount of total so mathematicians a long time ago came up with something called probability density curves probability density curves are used a lot and basically it's a way of calculating percentages based on finding the area under a curve so kind of think of it this way you have your your continuous scale over here maybe this is kilograms or something and then they drew a curve that sort of matches the shape of the of the data which would be normal in this case we have a nice normal bell-shaped curve and they would use basically the total area under the curve would be one remember one is equal to a hundred percent so think of this is the total proportion and then if you wanted to find a percentage or a proportion you would just look for the area under the curve at a certain place so I could actually use that to actually figure out these proportions that deal with normal data okay so one thing to keep in mind in these curves and this is a really important idea that's used throughout statistics is this idea of the area under the curve is the proportion and then the quantitative data is the scale under them on the bottom okay so let's look at an example and we can kind of work through this a little bit so let's suppose we have a company that sells bracelets and they're interested in what what are the sizes of women's wrists so we looked at some woman's wrist circumference and we're going to assume that they came out normal in fact we looked at some of this data this is one of the ones we looked at in a previous video and the mean came out to five point zero six seven inches and the standard deviation was 0.33 one inches now I'm going to assume that this was applying to all women and if if that was the case then what kind of percentages what would we could we could we calculate well it turns out so if we draw the normal curve here and I put the mean in the middle now I am using the mean of the sample mean x-bar sometimes in stat books you will see the Greek letter mu used and I used s for standard deviation though sometimes you will see Sigma if you're dealing with a population standard deviation so if we look at this so we see the mean is in the middle and then if we go one standard deviation above or below the mean usually that occurs right at about about halfway down the slope like when when a right kind of when this stops being a hill and starts being a valley and I kind of like that in calculus we have to call that an inflection point so we have an inflection point there and and again if you kind of look at that you can that's sort of where one standard deviation usually Falls and I'm usually once you measure that you just kind of go and replace it so what I'm looking at is if I go one standard deviation above or below the mean you know it turns out if you calculate the area under the curve and long before computers we had already calculated this with statistic web start with calculus to calculate areas under the curve again calculus is used a lot for that and we found that this these two sections right here are both about point three four and by the way statisticians have noticed this sort of commonality that if you have a normal data if you were within one standard deviation usually it was about 68% of the data so this is where we get that that 68% of the data is typical if you remember these are the typical values here and again these sections right here are both 0.34 and it's sort of always 0.34 if we have a normal normal shape now if we go to go to two standard deviations so x-bar plus two s now again we have a less area here and it turns out that area is about point one three five and then between two and three standard deviations is about point zero two three five and then more than three standard deviations is hardly any bit point zero zero one five so these these proportions are actually always the same for normal data that's why it's called a rule it's the same percentage as each time now these numbers on the bottom will change for depending on the situation but the percentages the proportions here will be the same that's kind of how you want to think about it and that's where guys name the empirical rule if you go if you actually add up the middle tube you get 68 percent or 0.68 if you met add up the middle for the forest for sections right here between two standard deviations below and two standard deviations above that adds up to point nine five or about 95 percent and again most people we'd already kind of knew these numbers from calculus but in the modern day computers actually computer software can calculate these numbers for you in fact we'll do another video where I'm going to show you how to actually calculate all these proportions with just using computer software so then if you again if you if you go here if I go three standard deviations away now that would be six sections or the middle six section so if I add up point zero two three five plus point one three five plus point three four plus point three four plus point one three five plus point zero two three five it adds up to about ninety nine point seven percent or point nine ninety seven and this is again where the empirical rule sometimes gets its name 68 95 99 point seven rule one standard deviation from the mean this is about 68 percent two standard deviations from the ninety-five percent three standard deviations from the mean ninety-nine point seven percent that's kind of where I got that that name now if I want to go ahead and fill this chart out a little bit so I can understand women's risk circumference is a little bit I'm going to put the mean in the middle all right the means in the middle like a one standard deviation above right that would be five point zero six seven plus point three three one and that's how I got five point three nine eight right there now I'm going to multiply the standard deviation by two and then added that add it to the mean and that would be five point seven two nine multiply the standard deviation by three and add it to the mean and we get six point zero six zero similarly I can do the same thing I can do the mean minus the standard deviation which is four point seven three six minus two standard deviations four point four zero five and minus three standard deviations four point zero seven four if you're doing this by hand you know like in the old days before computers you'd basically put the mean in the middle and then you would just subtract the standard deviation three times and put the mean in the middle then put add the standard deviation three times that's an easy way to do it now to keep in mind that we did want to Z scores right recently a z score is how many standard deviations you are above or below the mean so let's suppose you're at the mean plus 1s well you're one standard deviation above the mean that would mean your z-score for this number right here if we use that z-score formula we learned last time that would have a z-score of one and this number here is two standard deviations above the mean so it would have a z-score of two this number here is two standard deviations below the mean so it would have a z-score of negative two so you can kind of see how the z-score is kind of line up with this by the way if you look at these percentages with the z-scores then you get something called the standard normal distribution standard normal distribution is z-score distribution which shows you the percentages for normal data using z-scores in the old days what you would do is you'd calculate the z-score and then you'd look up the percentage that goes with that z-score on a chart modern statistics now we use computers to do this you can have a computer calculate the percentage for you okay so let's look at with all these questions we could answer from this chart so if we were able to sort of see this chart we could actually answer quite a bit of questions about women's risk circumference so what percent of women have wrist circumference greater than five point three nine eight so five point three nine eight is right here writes one standard deviation above the mean okay so we're looking for everything greater greater than kind of thing right tail it's really important that you start to get used to dealing with tails this is a very common thing that will carry you throughout statistics so think greater than I want you to think okay right tail that's where the bigger numbers are right right tail so right tail not left tail right tail bigger than greater than so we're looking for all of the sections that are bigger than five point three nine eight well if you notice there's actually three sections these three right here so point one three five point zero two three five and point zero zero one five so to get the answer we would just want to add up those three sections because they're all bigger than that cut off of five point three nine eight so my answer so my answer would be point point one three five plus point