Overview
This lecture explores different types of data visualizations, focusing on how each effectively represents quantitative data and reveals key information, outliers, and distribution patterns.
Dot Plots and Stem-and-Leaf Plots
- Dot plots replace histogram bars with dots, where each dot represents one data point.
- Dot plots allow easy counting of data points but still lose some details about individual values.
- Stem-and-leaf plots display actual data values split into "stems" (shared digits/bins) and "leaves" (unique digits).
- Leaves in stem-and-leaf plots are arranged in numerical order, providing more detail than dot plots or histograms.
- Stem-and-leaf plots are typically displayed sideways, with stems listed vertically.
Boxplots (Box-and-Whiskers Plots)
- Boxplots show the interquartile range (IQR) as a box, split at the median.
- The "whiskers" extend to minimum and maximum values within 1.5 × IQR from the median.
- Data outside the whiskers (fences) are considered potential "outliers."
- Outliers are not always errors; they may be rare but valid points, or mistakes/irrelevant values.
- Boxplots are helpful for comparing distributions and identifying spread, central tendency, and outliers between groups.
Handling Outliers
- Pre-set rules, like the 1.5 × IQR rule, help decide when to discard outliers, but these rules are not absolute.
- Removing or keeping outliers affects the interpretation and informativeness of a visualization.
Cumulative Frequency Plots
- Cumulative frequency plots show the total number of data points up to and including each bin.
- These plots are useful for answering questions like "How many values are less than or equal to a certain number?"
Interpreting Data Visualizations
- The main goal of a visualization is to communicate useful information.
- Poorly constructed graphs can mislead; always question the data displayed and the choices made.
- Being critical and asking questions improves understanding and guards against misinterpretation.
Key Terms & Definitions
- Dot Plot — A graph using dots to represent individual data points in each bin.
- Stem-and-Leaf Plot — A plot splitting data values into "stems" and "leaves" to show frequency and actual values.
- Boxplot (Box-and-Whiskers Plot) — A graph displaying data quartiles, median, and potential outliers.
- Interquartile Range (IQR) — The range between the first (Q1) and third (Q3) quartiles; measures spread of the middle 50% of data.
- Outlier — A data point lying outside the typical range, often defined as outside 1.5 × IQR from the quartiles.
- Cumulative Frequency Plot — A plot showing the accumulated total of data points up to each bin.
Action Items / Next Steps
- Practice creating and interpreting dot plots, stem-and-leaf plots, boxplots, and cumulative frequency plots with sample data.
- Stay critical: always ask questions about data visualizations you encounter.