📊

CS 231N: Loss Functions and Optimization

Jul 17, 2024

CS 231N Lecture 3: Loss Functions and Optimization

Administrative Notes

  • Assignment 1: Released, due Thursday, April 20th at 11:59 p.m.
    • Link available on the course website.
    • Due date adjusted to allow a full two weeks for completion.
    • Submit final zip file on Canvas.
  • Piazza: Check for administrative updates and example project ideas.
    • Example project ideas posted; contact mentors directly.
  • Office Hours: Posted on the course website as a Google calendar.
  • Google Cloud Credits: Each student can receive an additional $100 credit for use in assignments and projects. Instructions will be sent out.

Lecture Content

Recap of Lecture 2

  • Data-Driven Approach: Challenges in image classification.
  • k-Nearest Neighbor Classifier: Introduction to data-driven mindset.
  • Linear Classification: Learning templates per class, utilizing parameter matrix W.
  • Image Classification Problem: Stretching out images into vectors, using the linear classifier.
  • Hyperparameters and Cross Validation: Strategies discussed.

Today's Topic: Loss Functions and Optimization

Linear Classifier Recap

  • Linear Classifier: Parameter matrix W, stretching images into vectors.
  • Loss Function: Quantifying badness of W, guiding optimization.
  • Optimization: Searching for the least bad W.

Loss Functions

  1. Multi-Class SVM Loss:
    • Formulation: Compare score of the correct category and incorrect categories.
    • Example: Sum of margins of incorrect categories minus correct class plus margin of 1.
    • Min/Max Loss: Min is 0, Max is infinity.
    • Initial Loss: Usually, C-1 (number of classes minus one) if all scores are zero.
    • Regularization: Incorporate L2, L1, or other regularization to avoid overfitting.
  2. Softmax/Multinomial Logistic Regression Loss:
    • Formulation: Probabilistic interpretation, scores transformed via softmax.
    • Loss: Negative log probability of the true class.
    • Min/Max Loss: Min is theoretically 0, Max is infinity.

Regularization and Generalization

  • Regularization: Prevents overfitting, prefer simpler models (Occam's Razor).
  • Types:
    • L2 Regularization (Weight Decay): Penalizes the Euclidean norm of W.
    • L1 Regularization: Encourages sparsity.
    • Elastic Net: Combination of L1 and L2.
    • Max Norm Regularization.
    • Others: Dropout, batch normalization, and other specific to deep learning.

Optimization

  • Gradient Descent: Basic algorithm for minimizing the loss function by iterative updates.
    • Stochastic Gradient Descent: Using a minibatch of data for more efficient updates.
    • Learning Rate: Crucial hyperparameter.
    • Advanced Techniques: Variants like momentum and Adam optimizer enhance performance.

Practical Tips

  • Numerical Gradients: Use for debugging and verifying analytic gradients.
  • Interactive Demos: Tools available to visualize and comprehend the optimization process better.

Image Features

  • Pre-Deep Learning: Manual feature extraction stages were common.
    • Color Histograms: Counting colors in an image.
    • Histogram of Oriented Gradients (HOG): Capturing edge directions in an image.
    • Bag of Words: Inspired by NLP, used for visual words in an image.

Next Lecture

  • Dive deeper into neural networks, feature extraction learned from data, and backpropagation.