🤖

Introduction to Deep Learning - Lecture Notes

Jul 15, 2024

Introduction to Deep Learning - Lecture Notes

Lecture by: Alexander Amini and Ava

Course Overview

  • Fast-paced program on Deep Learning at MIT
  • Hands-on experience with software labs

Progress in AI & Deep Learning

  • Major advancements in the last decade
  • Generative deep learning is significant in 2022
    • Can generate new, unseen data

Example of Deep Learning

  • Introductory video showcasing synthetic generation of video & audio

Applications of Deep Learning

  • Generating synthetic environments for autonomous vehicle training
  • Language processing and generation
  • Software that generates software

Course Structure

  • One-week intensive program
  • Lectures and software labs combination
  • Foundations covered in the first lecture
  • Guest lectures from industry and academia
  • Prizes for outstanding work in labs and project

Daily Breakdown

  1. Lectures: Highly technical, foundational concepts
  2. Labs: Hands-on implementation of lecture concepts
  3. Project Pitch Competition: Friday
    • Focus on innovation and idea novelty
    • Significant prizes (e.g. Nvidia GPU)
  4. Final Lab: Combine concepts from lectures, focusing on robust, safe AI models

Goals & Terminology

Intelligence and AI

  • Intelligence: Ability to process information for future decisions
  • Artificial Intelligence (AI): Algorithms that mimic human information processing
  • Machine Learning (ML): Subset of AI, teaching a machine from experiences/data
  • Deep Learning: Subset of ML, focuses on neural networks to extract data patterns and learn tasks

Core Concepts in Deep Learning

Perceptron

  • Basic unit of neural networks
  • Structure:
    1. Takes inputs (X)
    2. Multiplied by weights (W)
    3. Adds a bias term (W0)
    4. Outputs result through non-linear activation function (G)

Non-Linear Activation Functions

  • Introduce non-linearities
  • Common examples:
    • Sigmoid: Outputs between 0 and 1, useful for probabilities
    • ReLU: Efficient computing, preferred in modern networks
  • Necessary for handling real-world, non-linear data

Neural Network Construction

  • Single neuron (Perceptron): Basis of neural networks
  • Layers: Stack multiple neurons
  • Deep Neural Networks: Stack multiple layers

Training Neural Networks

Loss Function

  • Measures network's errors
  • Common loss functions:
    • Cross-Entropy Loss: For binary classification
    • Mean Squared Error (MSE): For continuous predictions

Gradient Descent

  • Optimization algorithm to minimize loss
  • Steps:
    1. Initialize weights
    2. Compute gradient (how loss changes with weights)
    3. Update weights in opposite direction of gradient
    4. Repeat until convergence

Backpropagation

  • Calculates gradient for each weight
  • Uses chain rule to propagate errors back through the network

Practical Optimization

  • Learning Rate: Steps taken in gradient direction
    • Too low = slow learning
    • Too high = might diverge/oscillate
    • Adaptive algorithms exist to balance learning rates
  • Techniques for handling large datasets: Mini-batches
  • Regularization: Prevents overfitting
    • Dropout: Randomly deactivates neurons during training
    • Early Stopping: Stops training to prevent overfitting

Conclusion - Key Takeaways

  • Foundations of neural networks: Perceptrons to Deep Nets
  • Optimization Techniques
  • Future classes on Sequence Modeling and Advanced Architectures (e.g., Transformers)