📐

Linear Classifiers and Hyperplanes

Oct 24, 2025

Overview

This lecture introduced linear classifiers, the geometric interpretation of separating hyperplanes, and the perceptron learning algorithm, highlighting its properties, limitations, and connections to gradient descent.

Approaches to Linear Classification

  • Linear classifiers can be built by comparing discriminant functions or directly modeling the separating hyperplane.
  • A separating hyperplane consists of all points x such that ( f(x) = 0 ), where ( f(x) ) defines the hyperplane.
  • The sign of ( f(x) ) determines on which side of the hyperplane a point lies.

Properties of Separating Hyperplanes

  • The vector ( \beta ) is normal (perpendicular) to the hyperplane.
  • The equation ( \beta^T x + \beta_0 = 0 ) defines the hyperplane.
  • The signed distance from a point x to the hyperplane is proportional to ( f(x) ).
  • Normalizing ( \beta ) yields the actual distance from x to the hyperplane.

Perceptron Learning Algorithm

  • The perceptron algorithm adjusts the separating hyperplane to correctly classify training data.
  • Misclassified points are identified when the sign of ( f(x) ) disagrees with the true label.
  • The perceptron criterion (objective function) focuses on minimizing misclassification.
  • The algorithm uses stochastic (random) gradient descent: update ( \beta ) after each misclassified point.
  • Step size (( \rho )) is usually set to 1 in the perceptron but is chosen smaller in general stochastic gradient descent.
  • If data are linearly separable, perceptron converges to a solution, but the specific hyperplane depends on initialization.
  • If data are not linearly separable, the algorithm may not converge and can cycle indefinitely.

Limitations and Extensions

  • Multiple hyperplanes may separate linearly separable data; the algorithm converges to one such hyperplane.
  • Perceptron fails for non-linearly separable data (e.g., the XOR problem).
  • Enlarging input space with feature transformations can make data more separable.
  • There is often a trade-off between step size and convergence speed or stability.

Key Terms & Definitions

  • Separating Hyperplane — A boundary that divides data into different classes; defined by ( \beta^T x + \beta_0 = 0 ).
  • Perceptron Learning Algorithm — An iterative method to find a linear separator by updating parameters using misclassified examples.
  • Stochastic Gradient Descent — Optimization method updating parameters based on individual data points rather than the whole dataset.
  • Linearly Separable — Data is linearly separable if a hyperplane exists that perfectly separates classes with no misclassification.

Action Items / Next Steps

  • Review geometry and properties of hyperplanes.
  • Prepare to discuss methods for defining an optimal separating hyperplane in the next class.