CS 231N Lecture 3: Loss Functions and Optimization

Administrative Notes

Assignment 1: Released, due Thursday, April 20th at 11:59 p.m.
- Link available on the course website.
- Due date adjusted to allow a full two weeks for completion.
- Submit final zip file on Canvas.
Piazza: Check for administrative updates and example project ideas.
- Example project ideas posted; contact mentors directly.
Office Hours: Posted on the course website as a Google calendar.
Google Cloud Credits: Each student can receive an additional $100 credit for use in assignments and projects. Instructions will be sent out.

Data-Driven Approach: Challenges in image classification.
k-Nearest Neighbor Classifier: Introduction to data-driven mindset.
Linear Classification: Learning templates per class, utilizing parameter matrix W.
Image Classification Problem: Stretching out images into vectors, using the linear classifier.
Hyperparameters and Cross Validation: Strategies discussed.

Multi-Class SVM Loss:
- Formulation: Compare score of the correct category and incorrect categories.
- Example: Sum of margins of incorrect categories minus correct class plus margin of 1.
- Min/Max Loss: Min is 0, Max is infinity.
- Initial Loss: Usually, C-1 (number of classes minus one) if all scores are zero.
- Regularization: Incorporate L2, L1, or other regularization to avoid overfitting.
Softmax/Multinomial Logistic Regression Loss:
- Formulation: Probabilistic interpretation, scores transformed via softmax.
- Loss: Negative log probability of the true class.
- Min/Max Loss: Min is theoretically 0, Max is infinity.

Gradient Descent: Basic algorithm for minimizing the loss function by iterative updates.
- Stochastic Gradient Descent: Using a minibatch of data for more efficient updates.
- Learning Rate: Crucial hyperparameter.
- Advanced Techniques: Variants like momentum and Adam optimizer enhance performance.

Numerical Gradients: Use for debugging and verifying analytic gradients.
Interactive Demos: Tools available to visualize and comprehend the optimization process better.

Pre-Deep Learning: Manual feature extraction stages were common.
- Color Histograms: Counting colors in an image.
- Histogram of Oriented Gradients (HOG): Capturing edge directions in an image.
- Bag of Words: Inspired by NLP, used for visual words in an image.

Dive deeper into neural networks, feature extraction learned from data, and backpropagation.