Overview
This lecture covers the core concepts of AP Statistics Unit 1: exploring and describing one-variable data, including data types, graphical displays, measures of center and spread, outlier detection, transformations, box plots, comparison of distributions, and the basics of the normal distribution.
Types of Data
- Variables can be categorical (group labels) or quantitative (numerical values).
- Categorical data are typically words; quantitative data are typically numbers.
- Quantitative variables are either discrete (countable) or continuous (measurable, infinite values).
Summarizing Categorical Data
- Categorical data can be summarized using frequency tables and relative frequency tables.
- Graphs for categorical data include bar graphs and pie charts.
- Describe categorical distributions by noting most/least common categories and comparing groups.
Summarizing Quantitative Data
- Quantitative data use frequency tables with equal-width bins for grouping data.
- Graphical displays: dot plots, stem-and-leaf plots, histograms, and cumulative frequency graphs.
- Histograms display frequency (counts) or relative frequency (proportions) per bin.
Describing Distributions
- Describe quantitative data distributions by shape (e.g., symmetric, skewed), center (mean/median), spread (range/IQR/standard deviation), and outliers.
- Shapes include unimodal, bimodal, symmetric, skewed right/left, and uniform.
- Always describe distributions in context (e.g., "tree height in feet").
Measures of Center & Spread
- Mean: arithmetic average; sensitive to outliers.
- Median: middle value; resistant to outliers; use (n+1)/2 to locate its position.
- Range: max - min; affected by outliers.
- IQR: Q3 - Q1; middle 50% range.
- Standard deviation: measures average distance from mean.
Percentiles & Quartiles
- Percentile: percent of data at or below a value.
- Q1: 25th percentile; Median: 50th percentile; Q3: 75th percentile.
Outlier Detection
- Fence method: lower fence = Q1 - 1.5IQR, upper fence = Q3 + 1.5IQR.
- Mean/SD method: outliers are >2 standard deviations from the mean.
Transforming Data
- Adding/subtracting a constant affects center/position, not spread.
- Multiplying by a constant affects center, position, and spread.
Box Plots & Five Number Summary
- Five number summary: min, Q1, median, Q3, max.
- Modified box plots show outliers and spread visually.
- Each box plot section represents 25% of data.
Comparing Distributions
- Use comparative language: compare shape, center, spread, and outliers in context.
- Parallel box plots and histograms allow for visual distribution comparisons.
Normal Distribution & Z-Scores
- Normal distribution: symmetric, unimodal, bell-shaped; described by mean (μ) and SD (σ).
- Empirical rule: 68% within 1σ, 95% within 2σ, 99.7% within 3σ.
- Z-score: (value - mean)/SD; measures position in SDs from mean.
- Find proportions or percentiles using z-scores and technology (calculator, Desmos, or tables).
Key Terms & Definitions
- Categorical variable — group or category label (not measured numerically).
- Quantitative variable — numeric, measured or counted.
- Discrete — countable outcomes.
- Continuous — infinite possible values in a range.
- Distribution — pattern of values and how often they occur.
- Mean — arithmetic average.
- Median — middle value.
- IQR — range of middle 50% of data (Q3 - Q1).
- Standard deviation — average distance from mean.
- Percentile — proportion of data at or below a value.
- Outlier — value far from most others, determined by set rules.
- Box plot — graph showing five-number summary and outliers.
- Normal distribution — symmetric, bell-shaped model for continuous data.
- Z-score — standardized score showing distance from mean in SD units.
Action Items / Next Steps
- Complete and review the Unit 1 study guide for practice and reinforcement.
- Use answer keys to check your understanding.
- Practice interpreting and creating graphs, comparing distributions, finding summary statistics, and solving normal distribution problems.
- Prepare examples using your calculator/technology for normal distribution calculations.