📊

Distribution Overview and Types

Jun 13, 2025

Overview

This lecture introduces statistical distributions, explains their importance in data analysis, covers common distribution types, and demonstrates how to visualize and interpret distributions using Python.

Introduction to Distributions

  • Distributions describe how data values are spread and their frequency shapes.
  • Understanding distributions is essential for data analysis, statistical inference, and machine learning.
  • Statistical inference involves using samples to estimate population parameters due to the impracticality of collecting full population data.

Visualizing Distributions

  • Histograms show the frequency of data values in intervals; the y-axis represents counts or frequencies.
  • Symmetric distributions have mirror-image bars around the center; commonly related to the "normal" distribution.

Importance and Types of Distribution Shapes

  • Distributions help summarize and model datasets as well as validate statistical assumptions.
  • Common shapes: symmetric (bell curve/normal), uniform (equal frequency), bimodal (two peaks), skewed left/right (longer tail on one side).
  • Shape affects the relative positions of mean, median, and mode.

Exploring Dataset Distributions in Python

  • Use lists or create synthetic data when datasets are unavailable.
  • Functions like .unique() and .value_counts() help summarize categorical variables.
  • Histograms can reveal data concentration, skewness, and possible outliers.

Discrete Distributions

  • Discrete distributions model countable outcomes (e.g., dice rolls).
  • Uniform discrete: all outcomes equally likely (e.g., fair die).
  • Bernoulli: two outcomes (success/failure).
  • Binomial: number of successes in fixed n trials.
  • Geometric: number of trials until first success.
  • Poisson: counts events in a fixed interval.

Continuous Distributions

  • Continuous distributions cover infinite possible values within intervals.
  • Uniform continuous: all values in interval equally likely.
  • Normal distribution: bell-shaped curve, defined by mean and standard deviation.
  • Student's t, exponential, gamma, and beta are other important continuous distributions.

Practical Use and Statistical Testing

  • Many statistical tests (e.g., t-tests) require assumptions about underlying distributions.
  • It's important to verify distribution assumptions before applying these tests.
  • Overlaying theoretical distributions (e.g., normal) on data enables visual comparison, but formal tests are used for confirmation.

Key Terms & Definitions

  • Distribution — The way data values are spread or arranged.
  • Histogram — A plot showing frequency of data in intervals.
  • Statistical Inference — Drawing conclusions about populations from samples.
  • Symmetric Distribution — Both halves mirror each other around the center.
  • Skewness — Asymmetry in data distribution; can be left (tail on left) or right (tail on right).
  • Discrete Distribution — Deals with countable outcomes.
  • Continuous Distribution — Deals with outcomes over intervals with infinite values.
  • Normal Distribution — Bell-shaped, symmetric distribution defined by mean and standard deviation.
  • Bernoulli/Binomial/Geometric/Poisson — Specific types of discrete distributions.

Action Items / Next Steps

  • Review notebook functions for creating and analyzing distributions in Python.
  • Practice generating histograms and identifying distribution types on given datasets.
  • Attempt the exercise: plot and interpret the distribution of car mileage in the provided dataset.
  • Prepare for future lessons on statistical testing of distribution assumptions.