Next we have something called the five number summary. And the five number summary is the minimum data points, so the lowest value in the data, then Q1 separating the bottom 25% or the bottom quarter of the data, and then the median, remember that's the same as Q2, then Q3, and then the max. So 1, 2, 3, 4, 5. That's the five number summary.
And from the five number summary, we can draw something called a box and whiskers plot. So what we do is the box consists of this middle piece. So this is the box here.
And the box goes from Q1 to Q2. I'm sorry, the box goes from Q1 to Q3. So it starts here at Q1 and it ends here at Q3.
And there's a line here in the middle of the box, and that's the median or Q1. Now one whisker goes down here and to the minimum value and another whisker comes out here to the maximum value. And that's why it's called a box and a whisper plot. So the lines are the whiskers. Obviously the box you can see there.
And what we always do is we always have a scale underneath it. So that's why we have a scale. showing the values.
In example 4, the problem is to draw a box and whisk a plot that represents the data set from example 1. So I pulled up the 5 number summary from example 1 and if you read your textbook you can see here if you want to know how to get the calculator to do it you can look at pages 124 and 125 so what we're going to do is draw our box and whisker plot but But the first thing we're going to do, we found our five number summary. So now we need this horizontal scale here. And we know that it goes from a minimum of 11. So I think I'm going to start at 10. And it goes to a maximum here of 13. so I think I'm going to end at 40 so I know my minimum value is about here 11 and my maximum value it's about here at 35 so I want to try and get that lined up here And then the median value, well here's 25, so that's exactly in the middle. It's the middle of my box.
And then I have 23, so that'll be about here. And then 30 is up about here. So here's my box. And this line indicates Q1. This line indicates Q3.
And this line in the middle is Q2, or the median. Then I draw a line through so the dot at this end is the minimum and the dot at this end is the maximum. Now let's talk about percentiles and other fractiles.
So in addition to quartiles which divide the data up into one-quarter pieces you could have deciles they divide the data up into ten equal parts instead of four equal parts like the quartiles or percentiles they divide the data into 100 equal parts so quartile obviously is a quarter or four parts deciles ten parts percentiles 100 parts. We often use these percentiles in education or health and they are used to identify unusually high or low values. Oftentimes children's growth measurements are expressed in percentiles.
So if you're in the 95th percentile then 95% of the population is lower. You're in that. and that's unusually high, and the fifth percentile is pretty low. So notice that the 25th percentile is the first quartile, isn't it? So that's the same as Q1, because Q1 divides it up into four equal parts.
So Q2, that's the median. Divides the data into the top half and the bottom half and that's the 50th percentile I'm not going to have room to write all that down. And of course, Q3, well, that's the 75th percentile, the three-quarter mark. Be sure you understand what a percentile means.
The weight of a six-month-old infant is at the 78th percentile, means the infant weighs the same or more than 78% of all. six month old. In example five, we're asked to interpret the percentiles, the ogive. Remember, ogive is cumulative frequencies. At the right represents the cumulative frequency distribution for SAT scores of...
college mound students in a recent year what score represents the 80th percentile so you're going to read the graph it says percentile here on the vertical or the y-axis so you want to come over here and see where that meets it meets the graph right there and then you're going to have to go down and read what the x-axis is and that looks about 1250 you So that means that approximately 80% of the students had an SAT score of 1250 or less. In example 6, we have to find a percentile. And this is the data from example 2. So we want to find what percentile corresponds to $34,000. Remember the tuition costs were given in... thousands of dollars.
So 34,000 is the data entry that begins with 34. So it's here, 34,000 right here. So what we have to do is count the number of data entries less than 34. 1, 2, 3, 4, 5, 6, 7, 8. So there are 8 data entries less than 34. So now what I have to do is I've got to remember the total number of all these data points was 25. So in example 2, there were 25 different colleges that we looked at. So the percentile of...
34 well I've got to use a formula now the number of data entries less than 30 so that's 8 data entries less than 30 over the total number of data entries multiply by 100 because that's the percentile part and I put that in my calculator and I get 32 that means the tuition cost of 34,000 is more than 32% of the other tuition costs, because I have to include all of these values up to 34. When you know the mean and the standard deviation of a data set, you can measure the position of any individual data entry in the data set with something called a standard score or a Z-score. Now this is very important for the rest of the course, so make sure you want. understand this.
The standard score or the z-score is the number of standard deviations. A single data value x lies away from the mean. It could be less than the mean, in which case it will be negative, or it could be more than the mean, in which case it will be positive.
And we use this formula. z equals the data value minus the mean divided by the standard deviation. So remember, a score can be positive or negative.
If it's zero, then the z-score means it is exactly the same as the average or the mean. So usual scores are between... minus 2 and positive 2 standard deviations away from the mean. So that means most of the normal scores are here. The mean minus two standard deviations is this side, so that's the mean minus two standard deviations.
Then there's a mean in the middle, and on the other side there's the mean plus two standard deviations, 2 sigma. Anything that is more than two standard deviations... deviations away is unusual.
And if you're more than three standard deviations away from the mean, that is very unusual. In example 7, we're told that the mean speed of vehicles along a stretch of highway is 56 miles an hour. So let me just write down everything I know.
The mean speed is 56 miles per hour. The standard deviation is 4 miles per hour. You measure the speeds of three cars traveling along this stretch. So I'm going to say X1, car 1 is traveling at 62 miles per hour.
Car 2 is traveling at 47 miles per hour and car 3, data point 3, is 56 miles per hour. Find the z-score that corresponds to each speed and assume the distribution of the speeds is approximately bell-shaped. We have to have this assumption in order to use the formula for z-scores and remember the formula for z-scores.
is the data x minus the mean divided by the standard deviation. So I'm going to get the z score for the first data point. So I'll just call that z sub 1. And that's going to be x sub 1 minus the mean all over the standard deviation. So that's 62 minus 56 all divided by 4. And I can do that on my calculator.
Now on the calculator. later, it wants to have parentheses around the entire numerator in order to be correct. So that's what I'm going to do. So 62 minus the mean of 56, all divided by 4. So that's 1.5.
Now let me do Z sub 2, the second one. So that's X sub 2 minus the mean divided by the standard deviation and the second one is 47 minus 56 all divided by 4. So I'm going to do the same thing with my calculator. 47 minus 56, entire numerator and parenthesis, divided by 4. I'm going to get minus 2.25. And Z3 I almost don't even have to do.
That's X3 minus the mean minus the standard deviation sigma. And that is 56 minus 56. That's just 0. isn't it? Divided by 4. Now 0 divided by 4, I can put it in my calculator if I want to be sure, 0 divided by 4 is 0. So this tells me that the score of 62 miles an hour is one and a half standard deviations above the mean.
Kind of normal. A speed of 47 miles per hour is 2.25 standard deviations below the means. Getting to be unusual.
And a speed of 60, I'm sorry, a speed of 56 miles an hour is exactly the mean. Doesn't differ from the average at all.