📊

Summarizing and graphing numerical data

Aug 26, 2025

Overview

This lecture covers key graphical and numerical methods for examining numerical data, including scatterplots, dot plots, histograms, measures of variation, and box plots.

Graphical Methods for Numerical Data

  • A scatterplot displays case-by-case data for two numerical variables, revealing associations.
  • Dot plots show individual data points for a single variable, functioning as one-variable scatterplots.
  • Histograms group data into bins and display frequency counts as bars, illustrating data distribution shape.

Distribution Shapes and Features

  • Right-skewed distributions have a long tail to the right; left-skewed to the left.
  • Symmetric distributions have tails of roughly equal length on both sides.
  • The mode is indicated by a prominent peak in the data.
  • Distributions can be unimodal (one peak), bimodal (two peaks), or multimodal (multiple peaks).

Measures of Center and Variability

  • The mean (average) measures the center of the data distribution.
  • Variance is the average squared distance from the mean, indicating spread.
  • Standard deviation is the square root of the variance, representing typical distance from the mean.

Box Plots and Robust Statistics

  • A box plot summarizes data using the median, interquartile range (IQR), and extreme values.
  • The median (Q2 or 50th percentile) divides the data set in half.
  • The IQR is Q3 minus Q1 (75th minus 25th percentiles), showing the middle spread.
  • Whiskers extend to the most extreme values within 1.5 × IQR from the quartiles.
  • Outliers are observations beyond the whiskers, considered relatively extreme.

Populations, Samples, and Study Types

  • There are differences between populations (whole groups) and samples (subsets).
  • Explanatory variables explain or cause changes; response variables are affected by them.
  • Observational studies observe data without intervention; experiments actively manipulate variables.

Key Terms & Definitions

  • Scatterplot — a graph showing paired data for two numerical variables.
  • Dot plot — a plot showing individual data points for a single variable.
  • Histogram — a bar plot of binned numerical data showing frequency counts.
  • Mean — the average value of a data set.
  • Variance — average squared deviation from the mean.
  • Standard Deviation — square root of the variance, indicating typical spread.
  • Median (Q2) — the middle value splitting data in half.
  • IQR — the interquartile range, Q3 – Q1.
  • Outlier — an observation far outside the typical range (beyond whiskers in a box plot).

Action Items / Next Steps

  • Review homework or assigned readings on graphical and numerical data summaries.
  • Practice constructing and interpreting scatterplots, dot plots, histograms, and box plots.