📊

Understanding Histograms and Data Percentiles

Sep 17, 2024

Lecture Notes on Histograms and Data Distribution

Key Concepts

  • Understanding how to locate key percentiles in a histogram:
    • 25th Percentile (Q1)
    • Median
    • 75th Percentile (Q3)

Total Data Points

  • Total Pieces of Data: 403
    • Importance of knowing the total for calculations.

Finding the Median

  • Definition: The median is the middle piece of data.
  • Calculation:
    • Divide total data points by 2: 403 / 2 = 201.5
    • Round up to find median position: Data Piece Number = 202.
  • Finding Median:
    • Cumulative counting in histogram bins:
      • 1st Bin: 36
      • 2nd Bin: 54 (Total: 90)
      • 3rd Bin: 69 (Total: 159)
      • 4th Bin: 81 (Total: 240)
    • Median falls in the range of 25 to less than 30 minutes.
    • Median: 25 up to not including 30 minutes.

Finding Q1 (25th Percentile)

  • Definition: Q1 is the median of the lower half of the data.
  • Data in Bottom Half: 201 pieces.
  • Calculation:
    • Divide bottom data points by 2: 201 / 2 = 100.5
    • Round up to find Q1 position: Data Piece Number = 101.
  • Finding Q1:
    • Cumulative counting:
      • 1st Bin: 36
      • 2nd Bin: 54 (Total: 90)
      • 3rd Bin: 69 (Total: 159)
    • Q1 falls in the range of 20 to less than 25 minutes.
    • Q1: 20 up to not including 25 minutes.

Finding Q3 (75th Percentile)

  • Definition: Q3 is the median of the upper half of the data.
  • Data in Top Half: 201 pieces.
  • Calculation:
    • Find Q3 by counting down from the top: 101 pieces from the top.
  • Finding Q3:
    • Cumulative counting from the top:
      • Last Bin: 17
      • Next: 21 or 22 (Total: 38)
      • Next: 25 (Total: 63)
      • Remaining needed to reach 101: 43 more pieces.
    • Q3 falls in the range of 35 to less than 40 minutes.
    • Q3: 35 up to not including 40 minutes.

Summary of Key Values

  • Q1: 20 up to 25 minutes
  • Median: 25 up to 30 minutes
  • Q3: 35 up to 40 minutes

Importance

  • Understanding these values is crucial for identifying outliers in data distributions when analyzing histograms.

Additional Tasks

  • Next Steps: Estimate totals for each histogram bar to practice and apply concepts learned.