Overview
This lecture covers key graphical and numerical methods for examining numerical data, including scatterplots, dot plots, histograms, measures of variation, and box plots.
Graphical Methods for Numerical Data
- A scatterplot displays case-by-case data for two numerical variables, revealing associations.
- Dot plots show individual data points for a single variable, functioning as one-variable scatterplots.
- Histograms group data into bins and display frequency counts as bars, illustrating data distribution shape.
Distribution Shapes and Features
- Right-skewed distributions have a long tail to the right; left-skewed to the left.
- Symmetric distributions have tails of roughly equal length on both sides.
- The mode is indicated by a prominent peak in the data.
- Distributions can be unimodal (one peak), bimodal (two peaks), or multimodal (multiple peaks).
Measures of Center and Variability
- The mean (average) measures the center of the data distribution.
- Variance is the average squared distance from the mean, indicating spread.
- Standard deviation is the square root of the variance, representing typical distance from the mean.
Box Plots and Robust Statistics
- A box plot summarizes data using the median, interquartile range (IQR), and extreme values.
- The median (Q2 or 50th percentile) divides the data set in half.
- The IQR is Q3 minus Q1 (75th minus 25th percentiles), showing the middle spread.
- Whiskers extend to the most extreme values within 1.5 × IQR from the quartiles.
- Outliers are observations beyond the whiskers, considered relatively extreme.
Populations, Samples, and Study Types
- There are differences between populations (whole groups) and samples (subsets).
- Explanatory variables explain or cause changes; response variables are affected by them.
- Observational studies observe data without intervention; experiments actively manipulate variables.
Key Terms & Definitions
- Scatterplot — a graph showing paired data for two numerical variables.
- Dot plot — a plot showing individual data points for a single variable.
- Histogram — a bar plot of binned numerical data showing frequency counts.
- Mean — the average value of a data set.
- Variance — average squared deviation from the mean.
- Standard Deviation — square root of the variance, indicating typical spread.
- Median (Q2) — the middle value splitting data in half.
- IQR — the interquartile range, Q3 – Q1.
- Outlier — an observation far outside the typical range (beyond whiskers in a box plot).
Action Items / Next Steps
- Review homework or assigned readings on graphical and numerical data summaries.
- Practice constructing and interpreting scatterplots, dot plots, histograms, and box plots.