in this video we will be looking at the five number summary box plots and outliers the five number summary gives us a way to describe a distribution using only five numbers these five numbers include the minimum first quartile median third quartile and the maximum so if we took a sample and measured some random quantitative variable we could order these values from smallest to largest and use the five number summary to describe the distribution the minimum is the smallest value in a data set and the maximum is the largest value in a data set the median is the middle data value it is a point at which 50% of the data values are below the median and 50% of the data values are larger than the median now the median of the bottom half is called the first quartile it is a position where 25% of the data values are below it and 7 2 5% of the data values are larger than it the first quartile is essentially the median of the median the same thing can be set for the third quartile the median of the top half gives us the third quartile and it is a position where 75% of the data values are below it and 25% of the data values are larger than it the five number summary also gives us a way to divide the data into four equal quarters so let's determine the five number summary for the following data data set we'll start with the median to find the median you can look for it visually and you should find that the median is equal to 33 you can also use the formula to find the position of the median and we find that it is in the eth position which is equal to 33 now to find the first quartile we can use the same formula to find the position of q1 except this time n refers to the number of data values below the median there are seven data points below the median so n is equal to 7 and we find that the first quartile is in the fourth position so we count to the fourth position and we see that q1 is equal to 25 to find the third quartile we will do the same sort of thing except n refers to the number of data values that are above the median there will always be symmetry so you should find that Q3 is also in position four so we count four positions above the median and we find that Q3 is equal to 36 now the minimum is the smallest number and the maximum is the largest number so as a result this is our five number summary we can then take these five numbers and make something called a box plot a box plot gives us a visual representation of the five number summary and it looks something like this each vertical line on the box plot represents a number from the five number summary the horizontal line that extends ends out from the box are called whiskers and the actual box itself is called the interc cortile range the intertile range refers to the middle 50% of an order data set and it is equal to the third quartile minus the first quartile we can also have something called a modified box plot it's like a regular box plot except it accounts for outliers sometimes outliers in a data set aren't that obvious however we can mathematically check if a data set has outliers in it we say that a data value is considered to be an outlier if the data value is less than q1 minus 1.5 * the IQR or if the data value is greater than Q3 + 1.5 * the IQR so if you remember the following data set we had calculated the five number summary to be 10 25 33 36 and 59 to check for outliers we first need to calculate the interquartile range we found that Q3 is equal to 36 and we found that q1 is equal to 25 when we simplify this we get an answer of 11 now we said that a data value is an outlier if it is less than q1 minus 1.5 * the IQR or if it is greater than Q3 + 1.5 * the IQR at this point we can start substituting values q1 is 25 Q3 is 36 and the IQR is 11 and so we say that a data value is considered to be an outlier if it is less than 8.5 or if it is greater than 52.5 if we look at our data set we see that no values are less than 8.5 however we do have a value that is greater than 52.5 therefore we see that 59 is an outlier and so when we make a modified box plot we write the outlier as a DOT and the whisker will only extend to the new maximum in this case it is 50 so to quickly recap a regular box plot is drawn using the five number summary a modified box plot also uses the five number summary but it accounts for outliers and if there are outliers a whisker or both whiskers will extend only to the new minimum or maximum similar to backtack stem plots we can have side by-side box plots by having them side by side we can make easy mathematical and visual comparisons between two sets of data