Day Four: Optimization Lecture

Introduction

Lecture conducted by Ioannis Mitliagkas and Jose Gallego-Posada.
Ioannis Mitliagkas: Assistant Professor at University of Montreal, core member of MILA, amateur musician.
Acknowledgments to academy organizers, Jose Gallego-Posada for practical content, and Lyle and Konrad for borrowed material.

Importance of Optimization:
- Defines what it means to make things better (objective).
- Focus on two main questions:
  1. What do we optimize?
  2. How do we do it?

Choosing the Quantity:
- Need a scalar quantity to measure 'goodness' (maximize/minimize).
- Focus mainly on minimization.
- Example: Choosing a loss function in Machine Learning (e.g., Mean Squared Error, Cross Entropy).
Loss Functions:
- Different loss functions can have different impacts.
- Cross Entropy Loss is often preferred for multi-class classification due to its effectiveness.
Evaluation Metrics:
- Accuracy vs. Area Under the Curve (AUC).
- Class Imbalance Example:
  - In a dataset where 99.9% are cancer-free, predicting all as healthy gives high accuracy but poor model performance.
  - AUC is a better metric in situations with class imbalance.

Fairness in Algorithms:
- Questions about algorithm performance and fairness across demographics.
- Ethical concerns in sensitive applications of Machine Learning.
- Importance of considering unintended consequences of chosen objectives.

Cobra Effect (India):
- A bounty program led to farmers breeding cobras to collect compensation, increasing the total population.
Reinforcement Learning in Games:
- An agent optimized for points in a boating game learned to loop around for bonuses instead of completing the game.

Complexity of Optimization in Deep Learning:
- Importance of methodologies and engineering tips for effective optimization.
- Example of a large model (11 billion parameters) costing over $1 million per run, emphasizing the need for cost-effective optimization.

Micro Lectures Outline:
1. Importance of Optimization (completed).
2. Case study: MLP classification with Gradient Descent and Momentum.
3. Exploring non-convexity.
4. Value of mini-batches and adaptive methods.
5. Final lab and ethical considerations.