in this video I will talk about measures of position the objectives would be how to find the first second and third quartiles of a data set how to find the inner core time range of a data set and how to represent a data set graphically using a box and Whisper whisker plot quartile is approximately divide an ordered data set into four parts you have the first quartile denoted as q1 and that's about one quarter of the data will fall on or below q1 the second quartile which is your median and this is about one half of the data set will fall on or below Q2 the third quartile Q3 is where about three quarters of the data fall on or below Q3 excuse me so let's look at an example each year in the U.S automobile commuters waste fuel due to traffic congestion the amounts and gallons per year of fuel Wasted by commuters and the 15 largest U.S urban areas are listed we're going to find the first second and third quartiles of the data set and we're going to see what we observe so this is the data set goes from 20 all the way to 35. um and you have to put it in order excuse me so we're going to go ahead and put the data in order and we're going to find the overall median of that data set so as you can see 25 is the median of the data set that's what's the data into two halves so now we're going to look at all the values that fall below 25. so it goes from 11 to 25 as you can see I have marked it in red we're going to find the median of that data set the median of that data set is your q1 so looking at that first half the q1 is 23. that's your first quartet then we're going to look at all the data values that come after 25 that come after the median the overall median and we're going to find the median of that data set that will be your Q3 and in this case it is 30. so what does this tell us in about one quarter of the large urban areas automal Auto commuters waste 23 gallons of fuel or less so if you go back 23 was our q1 about one half weighs 25 gallons or less as you go back that's our median our overall median which is Q2 and about one half weighs 25 gallons or less excuse me in about three quarters weighs about 30 gallons or less so that would be our Q3 from this information we can calculate the interquartile range which is called the IQR it's a measure of variation that gives the range of the middle portion or about half of the data so in order to calculate the IQR you take the difference between the third and first quartile so you take Q3 minus q1 so in this example q1 was 23 and Q3 was 30. so if you subtract those you get 7. so all of this information can be obtained in statcrunch so let's go ahead and go to statcrunch and I already have my data listed I'm going to go to stat summary stat columns and I want to pick gallons and then the information that I want is q1 and Q3 oops so here's q1 here's Q3 and then I also want IQR so I'm going to hit the control button and hit IQR and all of this will pop up over here in the right box I hit commute compute sorry and I get 23 30 and 7 and that's what we got before q1 is 23 Q3 is 30. the IQR is seven recall I said Q2 is the median so if you wanted to calculate the median you just pick the word median and that would be right here medium so if I could do it again I'm on median I want q1 I want Q3 and I want IQR so hit compute and there you go so you have your Q2 which is the medium we can use the IQR to identify outliers so in order to do this first you have to find q1 and Q3 and then you calculate your IQR you're going to multiply the IQR by 1.5 so it's 1.5 times the IQR any data entry less than q1 minus 1.5 times IQR is an outlier any data entry greater than Q3 plus 1.5 times the IQR is an outlier so let's go back to our example we're going to see if there are any outliers in the data set so we already calculated the IQR to be 7. I'm going to take 1.5 and multiply that by 7 and I get 10.5 I'm going to take the 10.5 and I'm going to subtract it from q1 that gives me 12.5 I'm also going to take the 10.5 and add it to Q3 that gives me 40.5 so any data entry less than 12.5 is an outlier any data entry greater than 40.5 is an outlier so I put our data set back up here and as you can see 11 is less than 12.5 so 11 is considered an outlier so in large urban areas the amount of fuel Wasted by Auto commuters in the middle of the data set varies by at most 10.5 gallons notice that the outlier 11 does not affect the IQR the next thing we're going to do is use this information to draw a box and whisker plot so this gives you a visual and it highlights important features of a data set and it requires a five number summary the five number summary is your minimum data value as well as your maximum data value and then of course your quartiles you need q1 the median Q2 and Q3 we'll use that information to draw a box plot so in order to draw the box plot you're going to use q1 and Q3 for to form the Box so if you look at the picture I have q1 here Q3 here and then we'll draw our box and the median will be aligned somewhere in the middle of the box it could be in the directly in the middle it can be closer to the edges it depends on your your scale next we're going to draw whiskers the whiskers will go out to your maximum value and your minimum value so if we go back to our info our example our minimum value was 11. our maximum value is 35. we got the quartiles and our median so when you draw this box as you can see q1 and Q3 form the Box the median is inside the Box is 25 and then the whiskers go out to the smallest value which is 11 and the maximum value which is 35. the Box itself represents about half the data which are between 23 and 30. the left whisker represents about one quarter of the data so about 25 percent of the data entries are less than 23. the white rip the right whisker represents about one quarter of the data so about 25 percent of the data entries are greater than 30. also the length of the left whisker is much longer than the right one so this indicates that the data set has a possible outlier to the left and we already said that 11 is considered an outlier so that's an outlier to to the left so we can use statcrunch to draw a box plot so if we go to stats and we go to sorry we go to graph we go to box plot and we want to graph the data that's in the column gallons and you can draw it so it can show you the outliers or you can draw it without the outliers so I'm going to unclick this box where it says use fences to uh identify outliers and I'm going to draw my box horizontally so you just like the one you saw on the slide I hit compute and this is the graph of the this is the box plot of the data so if you take your mouse and put it over some parts of the box plot it'll give you information so it tells you your IQR your minimum your maximum the median and the quartiles 23 and 30. uh and as you can see this left tail is longer if we dry so it indicates what the outliers are go back to box plot gallons and this box here that says use fences to identify outliers when I hit compute this is the graph and as you can see if we put our Mouse over here on this little dot this is an outlier ns11. and that's what we said before the 11 was an outlier so um you will want to draw it if you want to see what the outliers are you if you have to click on that box if not it won't show you the outliers and the reason why I don't have anything to for Excel is because Excel calculates quartiles differently than statcrunch and your homework is based off statcrunch numbers so their their algorithm for calculating quartiles so I don't show you how to do quartiles and in Excel because they're going to come out a little bit different they may come out the same but uh you can mess around and see well let me see no I don't have my Excel open I will show you but um don't try to you I mean they they actually have commands in Excel for quartiles but like I said you may not get the same answers as you would if you did in statcrunch that's the end of this lecture