Support Vector Machines Lecture Notes

Jul 23, 2024

Support Vector Machines (SVM)

Overview

  • SVM is a nonlinear supervised machine learning model.
  • Given labeled training data, it finds the optimum hyperplane that categorizes new examples.

Concepts of Hyperplane

  • Hyperplane in 1D Space: A point.
  • Hyperplane in 2D Space: A line.
  • Hyperplane in 3D Space: A surface dividing space into two parts.

Linear Separability

  • Linearly Separable Data: Can be separated by a straight line.
    • Example shown in Figure Y.
  • Non-Linearly Separable Data: Cannot be separated by a straight line.
    • Example shown in Figure 2.

Choosing the Best Hyperplane

  • There are multiple possible hyperplanes to separate classes.
  • SVM selects the best hyperplane by maximizing the margin or "street" around it.
  • SVM is often called the Maximum Margin Classifier.

Support Vectors

  • Support vectors are points closest to the hyperplane and hardest to classify.
  • These points influence the final decision boundary.
    • Other points do not affect the boundary.

Prediction in SVM

  • During inference, SVM predicts output (Y) based on input features (X).
  • SVM finds a set of weights (W) for each feature and combines them to predict outputs.
  • Unlike neural networks, SVM optimizes weights based only on support vectors.

Mathematical Foundation

Hyperplane Equation

  • For 2D example with lines (h0, h1, h2), the relationship is:
    • h0 (hyperplane): W · x + B = 0
    • Where W is a vector perpendicular to the hyperplane.

Margin and Constraints

  • The margin is expressed as: 2 / ||W||
  • Aims to minimize 1/2 * ||W||^2 under constraints:
    • For positive samples: W · x + B ≥ 1
    • For negative samples: W · x + B ≤ -1

Lagrangian Multipliers

  • To solve constrained optimization problems, we use Lagrangian multipliers.
  • This creates a new expression combining constraints and the optimization objective.

Deriving Weights

  • Using derivatives and setting them to zero gives:
    • W = Σ α_i * y_i * x_i (where α_i corresponds to non-zero support vectors).

Soft Margin SVM

  • Deals with non-separable data and outliers by using slack variables (ζ).
    • Reformulates constraints allowing some misclassifications.
    • Regularization parameter C balances margin width against misclassification tolerance:
      • Small C: Wider margin, accepts more misclassifications.
      • Large C: Narrower margin, less tolerance to outliers.

Handling Non-Linearly Separable Data

  • Kernel Trick: Transforms data to a higher-dimensional space for better separation.
    • Replace dot product in Lagrangian with kernel function K.
  • Common Kernels:
    • Polynomial Kernel: K(X_i, X_j) = (C + X_i · X_j)^D
    • RBF (Gaussian Kernel): K(X_i, X_j) = exp(-γ ||X_i - X_j||^2)

Effects of Gamma in RBF Kernel

  • Small γ approximates linear SVM, while large γ can lead to overfitting.

Conclusion

  • Soft margin SVM handles noisy or non-linearly separable data effectively.
  • Understanding the underlying mathematics and kernel trick is crucial for successful SVM application.

Don't forget to subscribe and stay tuned for more updates!