MIT 6.S191: Introduction to Deep Learning
Instructor: Alexander Amini
Overview
- Fast-paced and intense one-week course
- Covers the rapidly changing field of AI and Deep Learning
- Course structure has evolved significantly over the years
- AI and deep learning now achieving and surpassing human performance in various fields
Importance of AI and Deep Learning
- AI revolutionizing many areas: science, mathematics, physics, etc.
- Rapid advancements making introductory lectures difficult to keep current
- Example of deep learning-generated content becoming commonplace
Course Structure
- A mix of technical lectures and software labs
- Example Labs: Music generation, facial detection, large language models
- Includes guest lectures from industry leaders
- Final project pitch competition with prizes
Foundations of Deep Learning
What is Intelligence?
- Intelligence: Ability to process information and make future decisions
- Artificial Intelligence: Computer processing information for decision making
- Machine Learning: Teaching computers to process information from data, removing hard-coded rules
- Deep Learning: Subset of ML using neural networks to process raw data
The Perceptron
- Building block of neural networks
- Steps:
- Ingest multiple inputs
- Multiply each input by a corresponding weight
- Sum the results
- Add bias term
- Apply a nonlinear activation function (e.g., sigmoid, ReLU)
- Importance of nonlinearity: Allows model to handle complex, real-world data
Neural Networks
- Composed of layers of perceptrons (neurons)
- Single-Layer: Input to single layer to output
- Multi-Layer (Deep) Network: Stacked layers creating hierarchical models
- Use of nonlinear activation functions between layers
- Code Implementation: Use of libraries such as TensorFlow for defining and training networks
Training Neural Networks
Gradient Descent
- Optimization method to minimize loss function
- Steps:
- Initialize weights randomly
- Compute loss
- Calculate gradients
- Update weights in opposite direction of gradient
- Repeat until convergence
Loss Functions
- Softmax cross entropy for classification tasks
- Mean squared error for regression tasks
- Objective: Minimize the difference between predicted output and actual output
Backpropagation
- Algorithm to compute gradients
- Uses chain rule to propagate error backwards through the network
- Adjusts weights to minimize loss
Practical Aspects of Training
Learning Rates
- Setting learning rates can be challenging
- If too small, convergence is slow
- If too large, network may diverge or overshoot
- Adaptive learning rates help optimize training
Batch Training
- Full data set gradient computation is infeasible
- Stochastic Gradient Descent (SGD): Uses a single data point for gradient computation (too noisy)
- Mini-Batch Gradient Descent: Uses a batch of data points, balances accuracy and efficiency
- Allows for parallel computation using GPUs
Avoiding Overfitting
- Regularization: Techniques to prevent overfitting
- Dropout: Randomly sets a fraction of neuron activations to zero during training
- Early Stopping: Stops training when performance on a validation set starts to degrade
Course Resources
- Lecture slides available online
- Piazza for questions and discussions
- Teaching team available for support
Conclusion
- This course provides foundational knowledge to create and understand deep learning models
- Prepare for rapid advancements in the field
Next Lecture
- Deep sequence modeling using RNNs and Transformers by Ava