Overview
This lecture introduces the concept of random variables in probability theory, discusses probability distributions, statistical properties, and common distributions critical for machine learning and data science.
Random Variables
- A random variable maps outcomes of an experiment to real numbers, each with an associated probability.
- Examples: The sum when rolling three dice or the number of heads in three coin tosses.
- The range of a random variable is all possible values it can take.
Probability Functions
- The induced probability function assigns probabilities to the values that a random variable can take.
- The cumulative distribution function (CDF), F_X(x), is the probability X ≤ x.
- A valid CDF is non-decreasing, has limits 0 at -∞ and 1 at ∞, and is right-continuous.
Types of Random Variables and Functions
- Discrete random variable: Has probability mass function (pmf); pmf sums to 1 and is non-negative.
- Continuous random variable: Has probability density function (pdf); pdf integrates to 1 and is non-negative.
Expectation, Variance, and Related Concepts
- Expectation or mean of X: E[X] = ∑ x·p(x) for discrete, ∫ x·f(x) dx for continuous X.
- Expectation is linear: E[aG1(X) + bG2(X) + c] = aE[G1(X)] + bE[G2(X)] + c.
- Variance: Var(X) = E[(X – μ)²] = E[X²] – (E[X])²; standard deviation is sqrt(Variance).
- Covariance: Cov(X, Y) = E[(X – E[X])(Y – E[Y])]; measures how X and Y change together.
- Correlation: Cov(X, Y)/sqrt(Var(X)·Var(Y)), ranges between -1 and 1.
Joint, Marginal, and Conditional Distributions
- Joint distribution: Probability of X = x and Y = y together.
- Marginal distribution: Probability of X regardless of Y (sum/integrate joint over Y).
- Conditional distribution: Probability X = x given Y = y, equals joint divided by marginal on Y.
Common Probability Distributions
- Bernoulli: X is 0 or 1, P(X=1)=p; E[X]=p, Var(X)=p(1-p).
- Binomial: X = number of successes in n Bernoulli trials; pmf uses "n choose x"; E[X]=np, Var(X)=np(1-p).
- Geometric: X = number of trials to first success; pmf: (1-p)^(x-1)p; E[X]=1/p, Var(X)=(1-p)/p².
- Uniform: Discrete: all n values have p=1/n; Continuous: X∼U(a,b), pdf=1/(b−a); E[X]=(a+b)/2, Var(X)=(b−a)²/12.
- Normal (Gaussian): X∼N(μ,σ²), pdf is bell curve; used widely due to central limit theorem.
- Beta: X in [0,1], two shape parameters α, β; E[X]=α/(α+β), Var(X)=αβ/[(α+β)²(α+β+1)].
Key Terms & Definitions
- Random Variable — A variable mapping outcomes to real numbers, capturing randomness.
- PMF (Probability Mass Function) — Assigns probability to each value of a discrete random variable.
- PDF (Probability Density Function) — Describes relative likelihood for a continuous random variable.
- CDF (Cumulative Distribution Function) — Probability that a random variable takes a value ≤ x.
- Expectation/Mean (E[X]) — Average value of a random variable, weighted by probabilities.
- Variance (Var(X)) — Measure of spread of a random variable around its mean.
- Covariance — Measure of how two random variables change together.
- Correlation — Normalized covariance, measures linear relationship between two variables.
- Marginal Distribution — Probability distribution of a subset of a collection of random variables.
- Conditional Distribution — Probability distribution of a random variable given another has occurred.
Action Items / Next Steps
- Review and practice derivations of pmfs, pdfs, expectations, and variances for all covered distributions.
- Complete assigned reading on probability theory if any concept is unclear.
- Participate in the forum for questions or clarifications.
- Prepare for the first assignment containing probability theory questions.
- Attend the next tutorial on linear algebra.