Understanding Five Number Summary and Box Plots

May 30, 2025

Five Number Summary, Box Plots, and Outliers

Overview

  • The Five Number Summary gives a concise description of a data distribution using only five numbers:
    1. Minimum
    2. First Quartile (Q1)
    3. Median
    4. Third Quartile (Q3)
    5. Maximum

Key Concepts

Five Number Summary

  • Minimum: Smallest value in the data set.
  • Maximum: Largest value in the data set.
  • Median: Middle data value; 50% of the data values are below and above it.
  • First Quartile (Q1): Median of the lower half of the data set; 25% of data values are below it.
  • Third Quartile (Q3): Median of the upper half of the data set; 75% of data values are below it.

Calculating the Five Number Summary

  • Order data values from smallest to largest.
  • Determine the median, Q1, and Q3 by dividing the data into halves and finding medians of these halves.
  • The Interquartile Range (IQR) is calculated as Q3 - Q1, representing the middle 50% of the data.

Box Plots

  • Box Plot: Visual representation of the five number summary.
    • Vertical lines represent the five numbers.
    • Horizontal line extensions ("whiskers") show data spread beyond the quartiles.
    • The box represents the IQR.
  • Modified Box Plot: Adjusted for outliers.
    • Outliers are shown as separate dots.
    • Whiskers extend only to the highest non-outlier value.

Identifying Outliers

  • A data value is an outlier if:
    • Less than Q1 - 1.5 * IQR
    • Greater than Q3 + 1.5 * IQR
  • Example: For a data set with Q1 = 25, Q3 = 36, and IQR = 11:
    • Low boundary for outliers: 8.5
    • High boundary for outliers: 52.5
    • Any value greater than 52.5 is an outlier (e.g., 59 is an outlier).

Comparing Data Sets

  • Side-by-side box plots allow for mathematical and visual comparisons between different data sets.