Overview
Today's lecture introduced the Expectation-Maximization (EM) algorithm, focusing on its application to Gaussian mixture models, and compared it with K-means clustering.
Introduction to EM Algorithm
- The EM algorithm (Expectation-Maximization) is used for data with both missing values and unknown parameters.
- With only missing data or only unknown parameters, estimation is easier and does not need EM.
- EM operates iteratively, alternating between estimating missing data (E step) and unknown parameters (M step).
Simple EM Example
- Example: Given data points from a normal distribution, estimate missing value X and mean μ when both are unknown.
- Start with a guess for μ, estimate X, then re-estimate μ, iterating until convergence.
- This cycle between estimating missing values and parameters is the core of EM.
EM for Gaussian Mixture Models
- Mixture models assume data comes from multiple Gaussian distributions (clusters); cluster identity is missing.
- Parameters (means, variances, and mixing proportions) of each Gaussian component are unknown.
- The missing data are the cluster assignments for each point.
EM Algorithm Steps in Gaussian Mixtures
- Randomly initialize cluster parameters (means, variances, mixture proportions).
- E step: Calculate membership weights (probabilities that each point belongs to each cluster) using a Bayes classifier.
- M step: Update cluster means, variances, and mixture proportions using these membership weights (weighted means and covariances).
- Iterate E and M steps until convergence.
Advantages Over K-means Clustering
- K-means assigns each point strictly to the nearest centroid using Euclidean distance.
- EM uses probabilistic assignments, accounting for the variance and shape of clusters.
- EM is preferable when clusters differ in variance or have elliptical shapes.
Implementation Notes
- Membership weights sum to 1 for each data point and are used to compute "soft" cluster assignments.
- Weighted means and covariances improve parameter estimation.
- EM can converge from poor initializations by iteratively improving assignments.
Key Terms & Definitions
- Expectation-Maximization (EM) Algorithm — Iterative method to estimate missing data and unknown parameters.
- E step (Expectation step) — Estimate missing data (e.g., cluster memberships).
- M step (Maximization step) — Update model parameters to maximize likelihood.
- Gaussian Mixture Model — Probabilistic model assuming data is generated from several normal distributions.
- Membership weight — Probability a data point belongs to a specific cluster.
- Bayes classifier — Classification approach using likelihoods and prior probabilities.
Action Items / Next Steps
- Complete homework: create plots and apply EM to mixture data as described.
- Review slides on calculating assignment weights, weighted means, and covariance matrices.
- Optional: Experiment with initial parameter settings to observe EM convergence.