🤖

EM Algorithm Overview

Jun 15, 2025

Overview

Today's lecture introduced the Expectation-Maximization (EM) algorithm, focusing on its application to Gaussian mixture models, and compared it with K-means clustering.

Introduction to EM Algorithm

  • The EM algorithm (Expectation-Maximization) is used for data with both missing values and unknown parameters.
  • With only missing data or only unknown parameters, estimation is easier and does not need EM.
  • EM operates iteratively, alternating between estimating missing data (E step) and unknown parameters (M step).

Simple EM Example

  • Example: Given data points from a normal distribution, estimate missing value X and mean μ when both are unknown.
  • Start with a guess for μ, estimate X, then re-estimate μ, iterating until convergence.
  • This cycle between estimating missing values and parameters is the core of EM.

EM for Gaussian Mixture Models

  • Mixture models assume data comes from multiple Gaussian distributions (clusters); cluster identity is missing.
  • Parameters (means, variances, and mixing proportions) of each Gaussian component are unknown.
  • The missing data are the cluster assignments for each point.

EM Algorithm Steps in Gaussian Mixtures

  • Randomly initialize cluster parameters (means, variances, mixture proportions).
  • E step: Calculate membership weights (probabilities that each point belongs to each cluster) using a Bayes classifier.
  • M step: Update cluster means, variances, and mixture proportions using these membership weights (weighted means and covariances).
  • Iterate E and M steps until convergence.

Advantages Over K-means Clustering

  • K-means assigns each point strictly to the nearest centroid using Euclidean distance.
  • EM uses probabilistic assignments, accounting for the variance and shape of clusters.
  • EM is preferable when clusters differ in variance or have elliptical shapes.

Implementation Notes

  • Membership weights sum to 1 for each data point and are used to compute "soft" cluster assignments.
  • Weighted means and covariances improve parameter estimation.
  • EM can converge from poor initializations by iteratively improving assignments.

Key Terms & Definitions

  • Expectation-Maximization (EM) Algorithm — Iterative method to estimate missing data and unknown parameters.
  • E step (Expectation step) — Estimate missing data (e.g., cluster memberships).
  • M step (Maximization step) — Update model parameters to maximize likelihood.
  • Gaussian Mixture Model — Probabilistic model assuming data is generated from several normal distributions.
  • Membership weight — Probability a data point belongs to a specific cluster.
  • Bayes classifier — Classification approach using likelihoods and prior probabilities.

Action Items / Next Steps

  • Complete homework: create plots and apply EM to mixture data as described.
  • Review slides on calculating assignment weights, weighted means, and covariance matrices.
  • Optional: Experiment with initial parameter settings to observe EM convergence.