📊

Understanding Percentiles in Data Sets

Mar 21, 2025

Lecture Notes: Measuring Data Position in a Data Set

Introduction to Data Positioning

  • Focus on methods to measure the position of data relative to other values in a data set.

Percentiles

  • Definition: Percentiles split a data set into 100 evenly divided pieces.
  • Purpose: Indicates the relative position of a score within a data set.
  • Example: Scoring in the 90th percentile means 90% of scores are below yours.
    • Does not indicate the actual score achieved.
    • Implies rank relative to other scores.

Understanding Percentile through an Example

  • Scenario: Exam score in 90th percentile.
  • Explanation: 90% of scores are lower than yours; 10% are higher.
  • Notes: A high score (e.g., 95/100) or a relatively lower score (e.g., 72/100) can both be in the 90th percentile depending on other scores.

Calculating Percentiles

  • Manual Calculation: Requires ordering and examining data values to find the specific percentile position.
  • Using Software:
    • Use the quantile function in R.
    • Percentiles are a specific type of quantile.
    • Quantile: General term for data division into arbitrary pieces.
    • Function Usage in R:
      • X: List of data values.
      • probs: Relative position in decimal form (e.g., 90th percentile is 0.90).

Example in R: Calculating the 73rd Percentile

  • Data Set: 29 ages of Academy award-winning actors.
  • Procedure:
    1. Create a list called ages with the data values.
    2. Use quantile function with probs set to 0.73 for the 73rd percentile.
  • Result: 73rd percentile is 63.32 years.
  • Interpretation:
    • 63.32 years is greater than 73% of the ages.
    • Equivalent to saying 73% of ages are less than 63.32 years.
    • Also implies 63.32 is smaller than 27% of ages (completing the 100%).

Conclusion

  • Percentiles provide meaningful insights into the rank of data within a data set.
  • Useful for interpreting scores, ages, or other measurable data in different contexts.