🤖

MIT 6.S191: Introduction to Deep Learning

Jul 12, 2024

MIT 6.S191: Introduction to Deep Learning

Instructor: Alexander Amini

Overview

Fast-paced and intense one-week course
Covers the rapidly changing field of AI and Deep Learning
Course structure has evolved significantly over the years
AI and deep learning now achieving and surpassing human performance in various fields

Importance of AI and Deep Learning

AI revolutionizing many areas: science, mathematics, physics, etc.
Rapid advancements making introductory lectures difficult to keep current
Example of deep learning-generated content becoming commonplace

Course Structure

A mix of technical lectures and software labs
Example Labs: Music generation, facial detection, large language models
Includes guest lectures from industry leaders
Final project pitch competition with prizes

Foundations of Deep Learning

What is Intelligence?

Intelligence: Ability to process information and make future decisions
Artificial Intelligence: Computer processing information for decision making
Machine Learning: Teaching computers to process information from data, removing hard-coded rules
Deep Learning: Subset of ML using neural networks to process raw data

The Perceptron

Building block of neural networks
Steps:
1. Ingest multiple inputs
2. Multiply each input by a corresponding weight
3. Sum the results
4. Add bias term
5. Apply a nonlinear activation function (e.g., sigmoid, ReLU)
Importance of nonlinearity: Allows model to handle complex, real-world data

Neural Networks

Composed of layers of perceptrons (neurons)
Single-Layer: Input to single layer to output
Multi-Layer (Deep) Network: Stacked layers creating hierarchical models
Use of nonlinear activation functions between layers
Code Implementation: Use of libraries such as TensorFlow for defining and training networks

Training Neural Networks

Gradient Descent

Optimization method to minimize loss function
Steps:
1. Initialize weights randomly
2. Compute loss
3. Calculate gradients
4. Update weights in opposite direction of gradient
5. Repeat until convergence

Loss Functions

Softmax cross entropy for classification tasks
Mean squared error for regression tasks
Objective: Minimize the difference between predicted output and actual output

Backpropagation

Algorithm to compute gradients
Uses chain rule to propagate error backwards through the network
Adjusts weights to minimize loss

Practical Aspects of Training

Learning Rates

Setting learning rates can be challenging
If too small, convergence is slow
If too large, network may diverge or overshoot
Adaptive learning rates help optimize training

Batch Training

Full data set gradient computation is infeasible
Stochastic Gradient Descent (SGD): Uses a single data point for gradient computation (too noisy)
Mini-Batch Gradient Descent: Uses a batch of data points, balances accuracy and efficiency
Allows for parallel computation using GPUs

Avoiding Overfitting

Regularization: Techniques to prevent overfitting
- Dropout: Randomly sets a fraction of neuron activations to zero during training
- Early Stopping: Stops training when performance on a validation set starts to degrade

Course Resources

Lecture slides available online
Piazza for questions and discussions
Teaching team available for support

Conclusion

This course provides foundational knowledge to create and understand deep learning models
Prepare for rapid advancements in the field

Next Lecture

Deep sequence modeling using RNNs and Transformers by Ava

Full transcript