Transcript for:
Outliers and Box Plots

so our high outliers so our high outliers would be anything greater than or equal to quartile 3 plus 1.5 IQR so quartile 3 was 13 plus 1.5 times 5.5 all right so if we if I grab a calculator here you can do this really quick I'm gonna use calculators very much I usually use computer programs but this kind of can use a calculator what I'm going to do the multiplication first so 1.5 times 5.5 gives us eight point two five if I add 13 to it I get 21 point two five so any values in the data set that are above 21 point two five are going to be considered high outliers or unusually high values well are there any yes there is right here this one so this one right here is gonna be a high outlier all right so this tells me that that 1 data value that was above the outlier cut off 21 point two five is now considered an unusual value so 31 dollars there was no value so that were exactly $21.25 but I do know that $33.50 in the actual data is an outlier all right now what about low outliers so with low outliers we said the formula for is anything less than or equal to quartile 17.5 minus 1.5 times 5.5 again 1.5 times 5.5 gives us eight point two five subtract that from seven point five and we get negative zero point seven five no it's okay you can get negative cut-offs that just tells you were there any numbers that were lower than negative point seven five and the dataset no that means there was no low outliers so there was no there's no numbers in the data set that were below the low outlier cutoff so there was no low outliers in other words $6.50 --is is not an outlier it's not unusual for somebody to spend six dollars and fifty cents in the store so this data only really had one high outlier the $33.50 by the way that's also what it would probably make this data very skewed right because of this one real high outlier here now there is a graph that goes with this data and it's called a box box a very famous graph box plots are very very famous and so I want to try to make a box plot here just so you can get the idea of how a box plot works so a box plot is sort of a graph of the quartiles and it also shows you outliers so just a couple things so we have the data goes from six point five to thirty three point five so I'm just gonna make a number line for that so um here's our here's our we'll kind of go this way so I'll do maybe I'll do here's $5 five 10 15 20 25 30 35 how's that it's gonna make a run number line here okay now what the firt in a box plot one of the first things they do is they draw a box in between quartile one and quartile three okay so just want to check one thing really quick yeah so core times are checking the fight did those okay and so it looks like we're okay now quartile one again with seven point five so at seven point five which is about right here I'm going to draw this sort of the this this part of the box at thirteen which is about right here here's thirteen I'm going to draw sort of the far right of the box by the way you can draw a box plots vertically also you can have this scale vertically and have the box going vertically so the the again the the far left of the box is always quartile one the far right of the box is always quartile three now they'll draw a line inside the box at the median at the average of eight point seven five by the way don't think that this has to be in the middle of the box it's not the this is not based on distances like the mean is it's based on just the numbers in order so the courts out to the the median could really fall anywhere here's our quartile two or median could really fall anywhere in the box okay so when you see the box think of this as the line in the box is the average and the length of your box actually is IQR so that's the length of your boxes your spread is your typical spread okay now the this was often called a box and whisker plot because it actually has whiskers on the box I think it came through my cat with screws or something that reminds people with that so the whiskers go to so this is what we call a box and whisker plot or just buck box plot for short the whiskers go to the highest and lowest values in the data set that are not outliers so they have to be numbers in the data that are not outliers so if I was looking at this I might say okay well the lowest number in the data is 6.5 and that was not an outlier so the bottom whisker the bottom whisker has got to go to 6.5 so this is where again a whisker it always goes to a number that's not an outlier though now that the highest number the data said the maximum of the data set was thirty three point five but my whiskers not going to go to that because it was an outlier it was designated as an outlier so the computer will now pick a new outlier I'm starting a new maximum so what's the next biggest number in the data set it has to be a number in the data set so the high whisker is going to be the highest number in the data set that's not an outlier so in our case that's going to now default to fourteen point seven five that's going to now be the the top whisker so my bottom whisker is going to go to six point five so kind of a very small whisker there and then the top whisker is going to go to fourteen point seven five so just a hair under fifteen there these little lines sticking off the end of the box are called the whiskers and those go those are the highest and lowest numbers in the data set that are not outliers now what about the outlier the box plot actually shows you outliers each computer program uses a different symbol though some computer programs use little circles some computer programs use little triangles some computer programs use little stars it just depends on the computer program so 33.5 is a high outlier so what the computer do as they'll draw like a little strangle or a little star here at 33.5 so if you see a little star or a little circle or a little triangle sometimes you'll see like a little triangle or circle or something that just means that's an outlier and what you can do when you get you make a box plot in the computer you just hold your court cursor on those little stars with the little circles and it'll tell you what all the outliers you could have like 20 outliers and just hold your cursor on each one and the computer will tell you what each of those outliers were so it's very useful I like box plots a lot because they actually give you the outliers which is very nice now people always ask me should I use the box plot to find the outliers for normal data well it's tricky it is tricky I've seen people that do that and sometimes it works out okay sometimes it agrees but remember that really in a calculation involving normal data we're usually using the mean and standard deviation so that's why I was showing you last time how to calculate the outliers from the mean and standard deviation this this calculation that the box plot is using is using the quartiles so it's not really using the mean and standard deviation so I like to think of box plots as going with non-normal data so I use box plots for non-normal data so our skewed data now could I figure out the shape from this graph well that can be really tricky a lot of times I much prefer looking at a dot plot or a histogram to defer to determine shape but you can probably get a good idea that this is skewed right if you think about it the median the median is the average right that's the average and this line right here is the average so think of that's where the center would be that's where like the highest bar would be and if I go to the left of that highest of the of the center see how it's a very small left tail but if I go from the center all the way to the high outlier here that's a very right tail so we're pretty sure that this is going to be a skewed right data set because the right tail the distance from the center is a lot farther to the right than it is to the left okay so you can sort of get an idea of shape from box plots though I do prefer looking at a histogram to get to get shape okay so this is called a box plot or a box and whisker plot and it's a great graph to make kind of shows you visually your whole data in terms of spread IQR is the length of your box the line inside is your average and it also shows you all the outliers so it's very useful graph okay so let's recap so for non normal or skewed data we want to use the median as our average iqrs our spread typical values are going to be between quartile 1 and quartile 3 and to find outliers just have the computer make a box plot you actually don't have to calculate these yourself the computer does it automatically just have the computer make a box plot and then just look for these little stars or these little triangles are little circles sometimes we'll see little triangles or little circles like that you know that that means these are all symbols for outliers okay all right well I'm hopefully that was helpful for you like I say we'll have some more videos on using the computers to calculate all of this but I hope this was helpful for you so again this is Matt to show and intro stats and I will see you next time