Overview
This lecture covers basic statistics concepts including mean, median, mode, range, quartiles, outliers, graphical data displays, and percentiles, with step-by-step example calculations.
Measures of Central Tendency and Spread
- Arrange data in increasing order before analysis.
- Mean: Sum all values and divide by the count; e.g., mean of [7, 7, 10, 14, 15, 23, 32] is 15.43.
- Median: Middle value when data is ordered; if even count, average the two middle values.
- Mode: The value(s) that appear most frequently; datasets can be bimodal (two modes).
- Range: Maximum minus minimum value.
Quartiles, Interquartile Range, and Outliers
- Q1: Median of lower half; Q2: Overall median; Q3: Median of upper half.
- Interquartile Range (IQR): Q3 - Q1.
- An outlier is outside the range [Q1 - 1.5 × IQR, Q3 + 1.5 × IQR].
- Example: Data [5, 7, 8, 10, 11, 13, 14, 16, 16, 17, 27], Q1=8, Q2=13, Q3=16, IQR=8, 27 is not an outlier.
Box-and-Whisker Plots
- Visualize minimum, Q1, median (Q2), Q3, and maximum (or outlier).
- Outliers are plotted as separate points.
- Symmetry or skewness can be seen in box plot shapes.
Data Distributions and Skewness
- Symmetric: Mean = Median; box plot is balanced.
- Skewed Right (Positive Skew): Mean > Median; right whisker longer.
- Skewed Left (Negative Skew): Mean < Median; left whisker longer.
Data Displays
- Dot Plot: Each data point is a dot above its value.
- Stem-and-Leaf Plot: Split data into "stem" (leading digit(s)) and "leaf" (last digit).
- Frequency Table: List values and their counts.
- Histogram: Bar graph for grouped data; bars are adjacent.
Percentiles and Frequency Tables
- Frequency: Count of each value.
- Relative Frequency: Frequency divided by total count.
- Cumulative Relative Frequency: Running total of relative frequency.
- Percentile value is determined using cumulative frequencies; average two values if percentile falls between them.
Key Terms & Definitions
- Mean — Average of all data points.
- Median — Middle value when data is ordered.
- Mode — The most common value(s) in a dataset.
- Range — Difference between highest and lowest values.
- Quartiles (Q1, Q2, Q3) — Values dividing data into four equal parts.
- Interquartile Range (IQR) — Distance between Q3 and Q1.
- Outlier — Value outside 1.5 × IQR from Q1 or Q3.
- Box-and-Whisker Plot — Graph showing minimum, quartiles, and maximum (with potential outliers).
- Dot Plot — Graph with dots representing data points.
- Stem-and-Leaf Plot — Diagram for organizing data by place value.
- Frequency Table — Table showing count of each value.
- Histogram — Bar graph for grouped continuous data.
- Percentile — Value below which a given percent of data falls.
Action Items / Next Steps
- Practice with new datasets: calculate mean, median, mode, range, quartiles, and IQR.
- Create box plots, dot plots, stem-and-leaf plots, frequency tables, and histograms for additional sets.
- Use cumulative relative frequency tables to find various percentiles.