Overview
This lecture explains how to interpret and construct box (box-and-whisker) plots for numerical data, focusing on the five-number summary, identifying outliers, and relating box plots to histograms.
Introduction to Box Plots
- Box plots (also called box-and-whisker plots) are used to graph numerical data.
- Box plots visually summarize the five-number summary: minimum, Q1, median, Q3, and maximum.
Structure of a Box Plot
- The main rectangle ("box") represents the interquartile range (IQR) from Q1 (lower quartile) to Q3 (upper quartile).
- The line inside the box marks the median value.
- The ends of the box are Q1 and Q3; the box length is the IQR (Q3–Q1).
- "Whiskers" extend from the box to the minimum and maximum values not considered outliers.
Dealing with Outliers in Box Plots
- Outliers are plotted as individual dots outside the whiskers.
- Whiskers end at the highest/lowest data point that is not an outlier.
- The minimum/maximum on a box plot excludes outliers.
Interpreting Data Distribution
- 25% of data falls between each: min–Q1, Q1–median, median–Q3, Q3–max.
- Box plots break data into four equal-sized "chunks" (quartiles).
Box Plots and Histograms
- Box plots and histograms both show data distribution, but box plots include the five-number summary.
- Right-skewed data have a longer right whisker and correspond to right-skewed histograms.
- Left-skewed data have a longer left whisker and correspond to left-skewed histograms.
- Symmetric data have whiskers of equal length.
Key Terms & Definitions
- Box Plot/Box-and-Whisker Plot — A graph showing minimum, Q1, median, Q3, and maximum (five-number summary) of numerical data.
- Five-number summary — Minimum, Q1 (lower quartile), median, Q3 (upper quartile), and maximum.
- Interquartile Range (IQR) — The range between Q1 and Q3, representing the middle 50% of data.
- Outlier — Data point significantly higher or lower than the rest of the data, shown as dots beyond the whiskers.
- Skewed Right — Distribution with a longer right tail/whisker.
- Skewed Left — Distribution with a longer left tail/whisker.
- Symmetric — Distribution with whiskers of similar length on both sides.
Action Items / Next Steps
- Identify the next largest/smallest non-outlier value for whisker placement when outliers are present.
- Compare box plots to corresponding histograms to assess distribution shape (symmetry/skewness).
- Practice drawing box plots using datasets with and without outliers.