🤖

MIT 6.S191: Introduction to Deep Learning

Jul 12, 2024

MIT 6.S191: Introduction to Deep Learning

Instructor: Alexander Amini

Overview

  • Fast-paced and intense one-week course
  • Covers the rapidly changing field of AI and Deep Learning
  • Course structure has evolved significantly over the years
  • AI and deep learning now achieving and surpassing human performance in various fields

Importance of AI and Deep Learning

  • AI revolutionizing many areas: science, mathematics, physics, etc.
  • Rapid advancements making introductory lectures difficult to keep current
  • Example of deep learning-generated content becoming commonplace

Course Structure

  • A mix of technical lectures and software labs
  • Example Labs: Music generation, facial detection, large language models
  • Includes guest lectures from industry leaders
  • Final project pitch competition with prizes

Foundations of Deep Learning

What is Intelligence?

  • Intelligence: Ability to process information and make future decisions
  • Artificial Intelligence: Computer processing information for decision making
  • Machine Learning: Teaching computers to process information from data, removing hard-coded rules
  • Deep Learning: Subset of ML using neural networks to process raw data

The Perceptron

  • Building block of neural networks
  • Steps:
    1. Ingest multiple inputs
    2. Multiply each input by a corresponding weight
    3. Sum the results
    4. Add bias term
    5. Apply a nonlinear activation function (e.g., sigmoid, ReLU)
  • Importance of nonlinearity: Allows model to handle complex, real-world data

Neural Networks

  • Composed of layers of perceptrons (neurons)
  • Single-Layer: Input to single layer to output
  • Multi-Layer (Deep) Network: Stacked layers creating hierarchical models
  • Use of nonlinear activation functions between layers
  • Code Implementation: Use of libraries such as TensorFlow for defining and training networks

Training Neural Networks

Gradient Descent

  • Optimization method to minimize loss function
  • Steps:
    1. Initialize weights randomly
    2. Compute loss
    3. Calculate gradients
    4. Update weights in opposite direction of gradient
    5. Repeat until convergence

Loss Functions

  • Softmax cross entropy for classification tasks
  • Mean squared error for regression tasks
  • Objective: Minimize the difference between predicted output and actual output

Backpropagation

  • Algorithm to compute gradients
  • Uses chain rule to propagate error backwards through the network
  • Adjusts weights to minimize loss

Practical Aspects of Training

Learning Rates

  • Setting learning rates can be challenging
  • If too small, convergence is slow
  • If too large, network may diverge or overshoot
  • Adaptive learning rates help optimize training

Batch Training

  • Full data set gradient computation is infeasible
  • Stochastic Gradient Descent (SGD): Uses a single data point for gradient computation (too noisy)
  • Mini-Batch Gradient Descent: Uses a batch of data points, balances accuracy and efficiency
  • Allows for parallel computation using GPUs

Avoiding Overfitting

  • Regularization: Techniques to prevent overfitting
    • Dropout: Randomly sets a fraction of neuron activations to zero during training
    • Early Stopping: Stops training when performance on a validation set starts to degrade

Course Resources

  • Lecture slides available online
  • Piazza for questions and discussions
  • Teaching team available for support

Conclusion

  • This course provides foundational knowledge to create and understand deep learning models
  • Prepare for rapid advancements in the field

Next Lecture

  • Deep sequence modeling using RNNs and Transformers by Ava