Transcript for:
Understanding Standard Deviation and Outliers

calculating the standard deviation is going to be really important moving forward in this course and we're going to need to calculate it a lot but we're going to learn one application in particular today one thing we could do with the standard deviation is use it to find outliers okay and an outlier is just a value that outlies outside the usual range of data values okay it's a data value that lies outside the usual range of values in the data set okay finding outliers can be important because it might help us know if we've made an error okay or if there's a value that's especially interesting finding outliers helps us know where to look check if there's been a mistake recording a data value or if a data value is especially interesting the example I give in the book is a list of prices for a small cup of coffee right and you know maybe one of the prices is you know 239 and another is 299 and another is 309 and then another is 259 no decimal point okay now do we really think that cup of coffee is 259 dollars when all of these other ones aren't well probably not uh it's probably a mistake right we probably forgot the decimal there um but maybe it is maybe it's an especially interesting data point maybe this is a luxury cup of coffee made with coffee beans grown in some special way in some special place they're extremely rare and it's cooked by the uh the coffee masters of I don't know but you know maybe it is an especially expensive cup of coffee but probably not so you know either way it might be a data point that's worth looking at more carefully okay so how do we find these outliers in that case the 259 that was one that was pretty obvious but sometimes they're not obvious so how do we decide when is a data value an outlier and when is it not there isn't a hard and fast rule different statisticians do this differently in different circumstances we're going to learn a simple rule for finding out Liars that we're going to apply in this class okay in this class we will define an outlier as a data value that lies more than two standard deviations away from the mean what does that mean so let's take a look at a image that will help us see what's going on here okay so this is the numbers from the textbook that I really like um it is using the notation for population mean and population standard deviation if you replace the symbols with the corresponding symbols for a sample the principle still applies okay so if you you swap out all the mues with X bars you swap out all the sigmas with S's so that we're talking about the mean and the standard deviation of the sample the idea here still holds okay so the the mean is kind of in the middle of the data right your standard deviation tells you this the the average or standard amount data deviates from that means the average distance so to speak okay so if we go one standard deviation away from the mean in either direction it's not too uncommon to find data in this area right one standard deviation away from the mean isn't that far it's the average amount we are away from the mean okay so that would be one standard deviation above the mean is Mu plus Sigma or mu plus one Sigma if you want to really emphasize that one standard deviation and then one standard deviation if we go the other direction we're subtracting mu minus Sigma okay what happens if we go two standard deviations away from me remember the the average distance away from the mean is one standard deviation okay so if we go two standard deviations there are some data values that might be over that far okay but now they're starting to get a little rare the average amount the average deviation is one standard deviation so mu plus 2 Sigma mu minus two Sigma that's two standard deviations above the mean two standard deviations below the mean okay so as it turns out almost all data lies between this range Within two standard deviations of the mean okay almost all data lives here okay almost all data lives here we Define an outlier as something that goes beyond that something that is beyond this range so this value right here these data points in here they are outliers because they are more than two standard deviations away from the mean that makes them unusual they are really far out from that Center value from the mean and so we count those as outliers so to find an outlier once we find the mean and the standard deviation we find these boundaries okay we look at two standard deviations above the mean two standard deviation below the mean those are the boundaries any data values between those boundaries in our data set would not be an outlier so we are looking for data values that are not between those numbers data values not between those numbers are considered outliers okay and so that's going to be how we find our outliers in the next example