📊

Probability Theory and Distributions

Oct 24, 2025

Overview

This lecture introduces the concept of random variables in probability theory, discusses probability distributions, statistical properties, and common distributions critical for machine learning and data science.

Random Variables

  • A random variable maps outcomes of an experiment to real numbers, each with an associated probability.
  • Examples: The sum when rolling three dice or the number of heads in three coin tosses.
  • The range of a random variable is all possible values it can take.

Probability Functions

  • The induced probability function assigns probabilities to the values that a random variable can take.
  • The cumulative distribution function (CDF), F_X(x), is the probability X ≤ x.
  • A valid CDF is non-decreasing, has limits 0 at -∞ and 1 at ∞, and is right-continuous.

Types of Random Variables and Functions

  • Discrete random variable: Has probability mass function (pmf); pmf sums to 1 and is non-negative.
  • Continuous random variable: Has probability density function (pdf); pdf integrates to 1 and is non-negative.

Expectation, Variance, and Related Concepts

  • Expectation or mean of X: E[X] = ∑ x·p(x) for discrete, ∫ x·f(x) dx for continuous X.
  • Expectation is linear: E[aG1(X) + bG2(X) + c] = aE[G1(X)] + bE[G2(X)] + c.
  • Variance: Var(X) = E[(X – μ)²] = E[X²] – (E[X])²; standard deviation is sqrt(Variance).
  • Covariance: Cov(X, Y) = E[(X – E[X])(Y – E[Y])]; measures how X and Y change together.
  • Correlation: Cov(X, Y)/sqrt(Var(X)·Var(Y)), ranges between -1 and 1.

Joint, Marginal, and Conditional Distributions

  • Joint distribution: Probability of X = x and Y = y together.
  • Marginal distribution: Probability of X regardless of Y (sum/integrate joint over Y).
  • Conditional distribution: Probability X = x given Y = y, equals joint divided by marginal on Y.

Common Probability Distributions

  • Bernoulli: X is 0 or 1, P(X=1)=p; E[X]=p, Var(X)=p(1-p).
  • Binomial: X = number of successes in n Bernoulli trials; pmf uses "n choose x"; E[X]=np, Var(X)=np(1-p).
  • Geometric: X = number of trials to first success; pmf: (1-p)^(x-1)p; E[X]=1/p, Var(X)=(1-p)/p².
  • Uniform: Discrete: all n values have p=1/n; Continuous: X∼U(a,b), pdf=1/(b−a); E[X]=(a+b)/2, Var(X)=(b−a)²/12.
  • Normal (Gaussian): X∼N(μ,σ²), pdf is bell curve; used widely due to central limit theorem.
  • Beta: X in [0,1], two shape parameters α, β; E[X]=α/(α+β), Var(X)=αβ/[(α+β)²(α+β+1)].

Key Terms & Definitions

  • Random Variable — A variable mapping outcomes to real numbers, capturing randomness.
  • PMF (Probability Mass Function) — Assigns probability to each value of a discrete random variable.
  • PDF (Probability Density Function) — Describes relative likelihood for a continuous random variable.
  • CDF (Cumulative Distribution Function) — Probability that a random variable takes a value ≤ x.
  • Expectation/Mean (E[X]) — Average value of a random variable, weighted by probabilities.
  • Variance (Var(X)) — Measure of spread of a random variable around its mean.
  • Covariance — Measure of how two random variables change together.
  • Correlation — Normalized covariance, measures linear relationship between two variables.
  • Marginal Distribution — Probability distribution of a subset of a collection of random variables.
  • Conditional Distribution — Probability distribution of a random variable given another has occurred.

Action Items / Next Steps

  • Review and practice derivations of pmfs, pdfs, expectations, and variances for all covered distributions.
  • Complete assigned reading on probability theory if any concept is unclear.
  • Participate in the forum for questions or clarifications.
  • Prepare for the first assignment containing probability theory questions.
  • Attend the next tutorial on linear algebra.