Overview
This lecture focused on analyzing the shape of histograms for numerical variables by examining symmetry/skewness, the number of modes, and the presence of outliers.
Analyzing Histogram Shapes
- Histograms are used to graph numerical variables such as number of visits or birth weight.
- Three main questions to ask about histogram shape: symmetry/skewness, number of modes, and presence of outliers.
- Symmetry can be symmetric, skewed left (left tail), or skewed right (right tail).
- Mode refers to the number of distinct peaks: unimodal (one), bimodal (two), or multimodal (more than two).
- Outliers are data points that are far from the rest of the distribution.
Example: Left Graph (Number of Visits)
- Shape: Skewed right because the shortest bars are on the right.
- Mode: Unimodal since only one clear peak is present.
- Outliers: Obvious outliers in the right tail.
Example: Right Graph
- Shape: Skewed left due to a tail on the left.
- Mode: Unimodal since there is only one peak in the outline.
- Outliers: Not obvious; the graph is inconclusive for outliers.
Clarifying Modes
- Unimodal means one peak, based on the outline, not individual bar numbers.
- Bimodal requires two distinct peaks higher than the surrounding bars, not necessarily of equal height.
- Multiple tall bars together do not automatically mean bimodal if the outline forms just one peak.
Key Terms & Definitions
- Histogram — bar graph showing the distribution of a numerical variable.
- Skewness — describes if the histogram tail is longer on the right (right-skewed) or left (left-skewed).
- Mode — the number of peaks in a distribution’s outline (unimodal: one, bimodal: two, multimodal: more).
- Outlier — data point significantly distant from the rest of the distribution.
Action Items / Next Steps
- Learn the mathematical rule for defining outliers in section 3.5 of the course.
- Practice identifying shape, mode, and outliers in more histogram examples.