when splitting data into quarters it's sometimes helpful to visualize the result using a box plot sometimes called a block box and whisker plot okay the idea is that the min max and the three quartiles are drawn with lines horizontal lines okay so one here for the max one here for Q3 Q2 and for q1 and then a line for the minimum okay the the values q1 Q2 Q3 they are enclosed in a box and then there are lines going from Q3 to the Max and from q1 so this is a box plot it summarizes how spread out the data is in each quarter let me give you a cleaner picture to work with [Music] okay so here's one that's a little bit cleaner you don't have all the data there okay the three lines in the Box represent the first second and third quartiles of the data the bottom line is the minimum the top line is the maximum each section here each section is a different quarter of the data notice that means each section here has the same number of data values right each quarter has 25 percent of the data values so what does the size here telling us and this section here is a lot bigger say than this section here what's the size is telling us it's not telling us how many data values there are each one of these sections has the same number of data values instead it's telling us how spread out the data is this section here which looks much more compact okay that is that's telling us that the data value is the data values in that quarter aren't very spread out whereas the data values here are going to be much more spread out because that quarter is much larger is represented in the Box bomb in a much larger sense so the data values are more spread out in that quarter okay so that's what a box plot tells us it tells us how spread out the data is so each section excuse me each section needs quarter has 25 of the data values it has a quarter of the data values okay so let's look at a couple examples here okay the following data are the heights of 40 students in a statistics class construct a box plot using r in which quarters the data spread out the least in which quarters of the data spread out the most okay so first off how do we actually construct a box plot and R well it's very simple um in R all you do is you type box plot one word and then provide the list of data values you want a box plot of and the computer will do the rest okay so let's take a look at this particular example here let's gather our data bind these data values into a list and let's put our list of data right here comma separated in between those parentheses okay let's give a name to our list these numbers represent the heights of 40 students in the statistics class right so we will call this list Heights call whatever you want but I like to often name my my lists and my numbers what they represent just as a reminder to me okay we want a box plot of this list of numbers so I'm going to type box plot and all I do is I tell it what the list of numbers is that I want a box plot of I've called my list of numbers Heights and so that's what I say so I'm telling the computer I want a box plot of the list of numbers that I have named Heights if I evaluate it's going to give me a bunch of stuff up here that you don't have to worry about but in particular this stuff right this picture right here is what we're going for okay that is our box plot so from this we could see our minimum value is right here that's our minimum value this line here is our first quartile this thicker line here is our second quartile top lines are third quartile and then the one at the very top is our maximum value okay now what was the question asking us for it asks us to construct a box plot and then use that to decide which quarter is the date of the least spread out in and in which quarter is the date of the most spread out in okay so the quarter where the date is the least spread out in is the quarter that that's the the smallest is the most squished together this quarter here is the smallest right we can agree out of these four quarters that one right there is the smallest that's going to be the second quarter okay this one's the first quarter second quarter third quarter fourth quarter okay second quarter it's between q1 and Q2 okay which quarter is the largest well if I look four quarters it looks like to me this top one is the biggest it looks more a little bit bigger than all the others which means the data in that quarter is the most spread out okay so this chord here which is the fourth quarter that's where the data is the most spread out okay so our answer is the data is the least spread out in the second quarter and is the most spread out in the fourth quarter okay and just to re-emphasize uh these sizes are just telling us how spread out the data is but both those quarters independent other size they they have the same amount of data all this is telling me is how spread out the data is okay let's look at one more example the r to construct side by side plots for the two sets of data below which data set has the larger spread of data which data set is more spread out in other words okay so we're going to have two data sets and actually use the computer to construct the two box plots so they're side by side All We Do is called Vox plot and then we just separate the different lists of values with commas so X1 would be the the set of data from our first data set X2 would be the set of data from our second data set okay so let's do this with the calculator okay and I'm going to grab my first data set and store it in my first list and I need to give it a name um I'll just call it data one for the first data set okay and then I'm going to also want my second data set so I copy that I'm going to combine it into a list so I paste those numbers in between the parentheses and I'll combine into a list and I need to give this list a name this is my second second data set so I'll call it data two okay and now we need um to create a box plot with both of these side by side okay to do this side by side we call the box plot function again but instead of just giving it one list of data I'm going to put a comma and I tell it what the second list of data is to and that way it's going to create two different box plots one box plot for this set of data in one box plot for this set of data and they'll be side by side so they'll be easy to compare so if I click evaluate I get a bunch of information up here which I'm not I don't care about that much I'm most interested in this picture down here so I'm going to get that image and I'm going to paste it in here okay so I have my box plot for data set one I have my box plot for data set two question is which data is most we can see data set one looks a lot more spread out generally than data set too right there are places where data set one is going to be real compact right here it's real real close together right there all the data right there right but generally if we look from the minimum to the maximum data set one is more spread out than data set two okay uh one note notice these three little dots right here those are data values and data set two that are especially extreme when creating box plots in R sometimes rather than making the lowest value the minimum or the highest value of the maximum sometimes there are values that are really much further extreme than some of the other values in the data set and sometimes it will exclude those from the main box plot and just include those as other dots to say hey these are also values but they're a little more extreme they're they're extreme enough that we didn't want to include it in the main box plot okay so that's what those dots mean