📊

Lecture on Percentiles and Measures of Data Dispersion

Jun 25, 2024

Lecture on Percentiles and Measures of Data Dispersion

Introduction to Percentiles

  • Percentile is different from percentage.
  • Definition: Data value where at least 100p% of the data are ≤ it and at least 100(1-p)% are ≥ it.
  • Example: 50th percentile is the median (50% of data ≤ value and 50% of data ≥ it).

Example of Percentile Calculation

  • 99th Percentile: 100 * 0.99 = 99% of data ≤ value.
  • 1st Percentile: 1% of data ≤ value.
  • Algorithm for Percentile Calculation:
    1. Arrange data in ascending order.
    2. Calculate n * p.
    3. If np is not an integer, find the smallest integer ≥ np (this value is the percentile).
    4. If np is an integer, average the values at positions np and np + 1.

Detailed Examples

  • n = 5, p = 0.5: np = 2.5 → 3rd value (median).
  • n = 6, p = 0.5: np = 3 → Average of values at 3rd and 4th positions.

Using Google Sheets for Percentile Calculation

  • Percentile Function in Google Sheets:
    • Formula: =PERCENTILE(data, p).
    • Example: Percentile for data (B2:B11) and p (C2).
    • Google Sheets uses a slightly different algorithm.

Google Sheets Algorithm for Percentile Calculation

  1. Arrange data in ascending order.
  2. Calculate rank: p * (n - 1) + 1.
  3. Split rank into integer and fractional parts.
  4. Compute ordered data value for integer part.
  5. Calculate percentile value: xi + fractional part * (xi+1 - xi).

Importance of Percentiles in Data Analysis

  • Quartiles:
    • 25th percentile (Q1): First quartile
    • 50th percentile (Q2): Median or second quartile
    • 75th percentile (Q3): Third quartile
  • Five Number Summary: Minimum, Q1, Median, Q3, Maximum.

Quartiles and Interquartile Range (IQR)

  • Breaks the data set into four parts.
  • Interquartile Range (IQR): Difference between Q3 and Q1, a measure of dispersion.

Conclusion

  • Covered definitions and computations of percentiles, quartiles, and measures of dispersion.
  • Example calculations and use of Google Sheets.
  • Introduction to the next topic: Association between two variables.