Coconote
AI notes
AI voice & video notes
Try for free
📊
CS 231N: Loss Functions and Optimization
Jul 17, 2024
CS 231N Lecture 3: Loss Functions and Optimization
Administrative Notes
Assignment 1
: Released, due Thursday, April 20th at 11:59 p.m.
Link available on the course website.
Due date adjusted to allow a full two weeks for completion.
Submit final zip file on Canvas.
Piazza
: Check for administrative updates and example project ideas.
Example project ideas posted; contact mentors directly.
Office Hours
: Posted on the course website as a Google calendar.
Google Cloud Credits
: Each student can receive an additional $100 credit for use in assignments and projects. Instructions will be sent out.
Lecture Content
Recap of Lecture 2
Data-Driven Approach
: Challenges in image classification.
k-Nearest Neighbor Classifier
: Introduction to data-driven mindset.
Linear Classification
: Learning templates per class, utilizing parameter matrix W.
Image Classification Problem
: Stretching out images into vectors, using the linear classifier.
Hyperparameters and Cross Validation
: Strategies discussed.
Today's Topic: Loss Functions and Optimization
Linear Classifier Recap
Linear Classifier
: Parameter matrix W, stretching images into vectors.
Loss Function
: Quantifying badness of W, guiding optimization.
Optimization
: Searching for the least bad W.
Loss Functions
Multi-Class SVM Loss
:
Formulation
: Compare score of the correct category and incorrect categories.
Example
: Sum of margins of incorrect categories minus correct class plus margin of 1.
Min/Max Loss
: Min is 0, Max is infinity.
Initial Loss
: Usually, C-1 (number of classes minus one) if all scores are zero.
Regularization
: Incorporate L2, L1, or other regularization to avoid overfitting.
Softmax/Multinomial Logistic Regression Loss
:
Formulation
: Probabilistic interpretation, scores transformed via softmax.
Loss
: Negative log probability of the true class.
Min/Max Loss
: Min is theoretically 0, Max is infinity.
Regularization and Generalization
Regularization
: Prevents overfitting, prefer simpler models (Occam's Razor).
Types
:
L2 Regularization (Weight Decay)
: Penalizes the Euclidean norm of W.
L1 Regularization
: Encourages sparsity.
Elastic Net
: Combination of L1 and L2.
Max Norm Regularization
.
Others
: Dropout, batch normalization, and other specific to deep learning.
Optimization
Gradient Descent
: Basic algorithm for minimizing the loss function by iterative updates.
Stochastic Gradient Descent
: Using a minibatch of data for more efficient updates.
Learning Rate
: Crucial hyperparameter.
Advanced Techniques
: Variants like momentum and Adam optimizer enhance performance.
Practical Tips
Numerical Gradients
: Use for debugging and verifying analytic gradients.
Interactive Demos
: Tools available to visualize and comprehend the optimization process better.
Image Features
Pre-Deep Learning
: Manual feature extraction stages were common.
Color Histograms
: Counting colors in an image.
Histogram of Oriented Gradients (HOG)
: Capturing edge directions in an image.
Bag of Words
: Inspired by NLP, used for visual words in an image.
Next Lecture
Dive deeper into neural networks, feature extraction learned from data, and backpropagation.
📄
Full transcript