📊

Basic Statistics Concepts

Sep 1, 2025

Overview

This lecture covers basic statistics concepts including mean, median, mode, range, quartiles, outliers, graphical data displays, and percentiles, with step-by-step example calculations.

Measures of Central Tendency and Spread

  • Arrange data in increasing order before analysis.
  • Mean: Sum all values and divide by the count; e.g., mean of [7, 7, 10, 14, 15, 23, 32] is 15.43.
  • Median: Middle value when data is ordered; if even count, average the two middle values.
  • Mode: The value(s) that appear most frequently; datasets can be bimodal (two modes).
  • Range: Maximum minus minimum value.

Quartiles, Interquartile Range, and Outliers

  • Q1: Median of lower half; Q2: Overall median; Q3: Median of upper half.
  • Interquartile Range (IQR): Q3 - Q1.
  • An outlier is outside the range [Q1 - 1.5 × IQR, Q3 + 1.5 × IQR].
  • Example: Data [5, 7, 8, 10, 11, 13, 14, 16, 16, 17, 27], Q1=8, Q2=13, Q3=16, IQR=8, 27 is not an outlier.

Box-and-Whisker Plots

  • Visualize minimum, Q1, median (Q2), Q3, and maximum (or outlier).
  • Outliers are plotted as separate points.
  • Symmetry or skewness can be seen in box plot shapes.

Data Distributions and Skewness

  • Symmetric: Mean = Median; box plot is balanced.
  • Skewed Right (Positive Skew): Mean > Median; right whisker longer.
  • Skewed Left (Negative Skew): Mean < Median; left whisker longer.

Data Displays

  • Dot Plot: Each data point is a dot above its value.
  • Stem-and-Leaf Plot: Split data into "stem" (leading digit(s)) and "leaf" (last digit).
  • Frequency Table: List values and their counts.
  • Histogram: Bar graph for grouped data; bars are adjacent.

Percentiles and Frequency Tables

  • Frequency: Count of each value.
  • Relative Frequency: Frequency divided by total count.
  • Cumulative Relative Frequency: Running total of relative frequency.
  • Percentile value is determined using cumulative frequencies; average two values if percentile falls between them.

Key Terms & Definitions

  • Mean — Average of all data points.
  • Median — Middle value when data is ordered.
  • Mode — The most common value(s) in a dataset.
  • Range — Difference between highest and lowest values.
  • Quartiles (Q1, Q2, Q3) — Values dividing data into four equal parts.
  • Interquartile Range (IQR) — Distance between Q3 and Q1.
  • Outlier — Value outside 1.5 × IQR from Q1 or Q3.
  • Box-and-Whisker Plot — Graph showing minimum, quartiles, and maximum (with potential outliers).
  • Dot Plot — Graph with dots representing data points.
  • Stem-and-Leaf Plot — Diagram for organizing data by place value.
  • Frequency Table — Table showing count of each value.
  • Histogram — Bar graph for grouped continuous data.
  • Percentile — Value below which a given percent of data falls.

Action Items / Next Steps

  • Practice with new datasets: calculate mean, median, mode, range, quartiles, and IQR.
  • Create box plots, dot plots, stem-and-leaf plots, frequency tables, and histograms for additional sets.
  • Use cumulative relative frequency tables to find various percentiles.