EM Algorithm Overview

Overview

Today's lecture introduced the Expectation-Maximization (EM) algorithm, focusing on its application to Gaussian mixture models, and compared it with K-means clustering.

Introduction to EM Algorithm

The EM algorithm (Expectation-Maximization) is used for data with both missing values and unknown parameters.
With only missing data or only unknown parameters, estimation is easier and does not need EM.
EM operates iteratively, alternating between estimating missing data (E step) and unknown parameters (M step).

Simple EM Example

Example: Given data points from a normal distribution, estimate missing value X and mean μ when both are unknown.
Start with a guess for μ, estimate X, then re-estimate μ, iterating until convergence.
This cycle between estimating missing values and parameters is the core of EM.

EM for Gaussian Mixture Models

Mixture models assume data comes from multiple Gaussian distributions (clusters); cluster identity is missing.
Parameters (means, variances, and mixing proportions) of each Gaussian component are unknown.
The missing data are the cluster assignments for each point.

EM Algorithm Steps in Gaussian Mixtures

Randomly initialize cluster parameters (means, variances, mixture proportions).
E step: Calculate membership weights (probabilities that each point belongs to each cluster) using a Bayes classifier.
M step: Update cluster means, variances, and mixture proportions using these membership weights (weighted means and covariances).
Iterate E and M steps until convergence.

Advantages Over K-means Clustering

K-means assigns each point strictly to the nearest centroid using Euclidean distance.
EM uses probabilistic assignments, accounting for the variance and shape of clusters.
EM is preferable when clusters differ in variance or have elliptical shapes.

Implementation Notes

Membership weights sum to 1 for each data point and are used to compute "soft" cluster assignments.
Weighted means and covariances improve parameter estimation.
EM can converge from poor initializations by iteratively improving assignments.

Key Terms & Definitions

Expectation-Maximization (EM) Algorithm — Iterative method to estimate missing data and unknown parameters.
E step (Expectation step) — Estimate missing data (e.g., cluster memberships).
M step (Maximization step) — Update model parameters to maximize likelihood.
Gaussian Mixture Model — Probabilistic model assuming data is generated from several normal distributions.
Membership weight — Probability a data point belongs to a specific cluster.
Bayes classifier — Classification approach using likelihoods and prior probabilities.

Action Items / Next Steps

Complete homework: create plots and apply EM to mixture data as described.
Review slides on calculating assignment weights, weighted means, and covariance matrices.
Optional: Experiment with initial parameter settings to observe EM convergence.