Transcript for:
Understanding Percentiles in Data Sets

in this section we're going to look at ways we can measure the position of data relative to other values in a data set the first kind of measure we're going to look at is percentiles percentiles split the a data set into a hundred evenly divided pieces okay so percentiles divide a data set into 100 evenly divided pieces so what does that mean in practice let's look at an example okay suppose we had an exam in our statistics class okay and you scored in the 90th percentile on the exam you scored in the 90th percentile on the exam what does that mean so first off let's say what it doesn't mean it doesn't mean that you got 90 on the exam in fact it doesn't say anything directly about what your score is on the exam instead it's telling you what your position is what your the score on your exams position is relative to the other scores on this exam in the class okay so let's kind of imagine we have all the scores on a number line so we start over here this is our lowest value and we go over here this is our highest value so this is our minimum value and this is our max value okay so in terms of our exam this minimum value would be the the lowest scoring exam the very lowest exam in the class way down there and the very highest exam in the class way up here okay what the 90th percentile does is it tells you what place you are relative to the other exam scores in the class okay so maybe we say right around here this might be the 90th percentile and what that means is that 90 of exam scores are smaller than your exam scores 90 percent of exam scores are over here they're smaller than whatever score you got on the exam okay and this right here if I can draw it right this then would be ten percent right because you need a hundred percent if ninety percent are below then 10 or above so 10 of exam scores okay so notice it's not saying anything about what score you got now maybe you got a very high score uh maybe you got a 95 out of 100 on the exam okay 95 out of 100 might be the 90th percentile okay it might be the 90th percentile if ninety percent of the other students scored less than you on the exam okay or if we look at the other direction if 10 of students scored higher than you okay or maybe it was a really really hard exam everyone did very poorly maybe you got a um uh 72 out of 100 on the exam okay it could still be the 90th percentile if 90 of the people in the class scored worse than you on the exam and ten percent of the people scored better okay so that 90th percentile that's telling you where you are relative to the other data values okay how do we actually calculate a percentile if we're going to do it by hand it takes a little bit of work we would have to put all our little our data values in order and kind of count to figure out where the 90th percentile is it takes a little bit of work we don't want to put in that much work so we're going to let the calculator the computer do the hard work for us okay so if we're doing this in r we can calculate a percentile using the quantile function okay now what's the quantile function what is a quantile first off a percentile is a special kind of quantile remember at percentile the idea is that it splits your data into a hundred equal pieces okay quantiles also split your data but into an arbitrary number of pieces so if you split it into 10 pieces that's a special kind of quantile okay if you split it into 70 pieces that would be a kind of quantile percentiles are a very common and special type of quantile okay so quantile is just a more General name for what we're seeing with percentiles first thing we are going to tell the quantile is the data we're looking at so X is the list of data that we're interested in okay it needs to be able to look at the data and the next thing is we need to tell it where what what's the relative position that we're interested in this argument is called probs it's short for probabilities don't worry too much about why it's called probabilities they'll make sense later on in the course for now just you know take it at face value but X here that's our list of data and probs that is our uh relative position in decimal form so it's easiest to think about this in relation to percentile so if I have say 90th percentile if that's what I want like I did in the example above then what I would put in for that probes argument is probs equals 0.90 so just like if it's 90 percent we convert that to 0.90 it's the same idea here if we're interested in the 90th percentile 90 is a decimal 0.90 okay so let's take a look at an example and see this function in action okay listed are 29 ages for Academy award-winning best actors find the 73rd percentile explain in words what this value represents so we have a list of data 29 ages we want to find the 73rd percentile so we're going to do this using our calculator so I'm going to pull the calculator over here okay and I first thing I want is the list of Ages so remember how we do a list in r we start with the c because we want to combine the values into a list and then we add some parentheses anything within the parentheses will be our list now I'm just going to copy and paste this list into my calculator okay the values are separated with commas and all the values are within the parentheses okay so that's my list now I want to give my list a name I can call it anything I want but because this data are the ages of the academy award-winning best actress I'm going to call it ages just so to help me remember what it represents okay and go to the end of the line press enter so I could go to the next line and now now that I have all my data I'm going to use that quantile okay I'm going to tell it the list of data I'm interested in that's the first thing I tell it so I've called my list of data ages okay and now I tell it what quantile I want what percentile I want probs argument remember I want the 73rd percentile and so because I want the 73rd percentile I need to convert that to a decimal form 73 percent as a decimal 0.73 okay and so this will tell me the 73rd percentile for the data that I have named ages okay so let's evaluate do the calculation and I get my 73rd percentile is 63.32 okay so the 73rd percentile is 63.32 and in this case we're measuring um age right so 63.32 years old okay we still need to explain in words what the value represents okay so what what does this actually mean well from what we talked about before a number is a 73rd percentile if it is larger than 73 of the other data values so 63.32 is larger than 73 percent of the ages of Academy award-winning best actors okay that's one way to interpret the data um if you want to flip it around you could say 73 of the ages of Academy award-winning best actors are smaller than this they mean the same thing it's just saying it in a different order okay so 63.32 years old 73 of Academy award-winning best actors are going to be younger than that 63.32 years old okay or if we look at it on the other side um 73 percent uh plus 27 is 100 right so if 63.32 years old is larger than 70 percent of the ages then 63.32 years old is also smaller than 27 percent of the ages okay so together if you look at both sides you have to get 100 percent okay