Deep Learning for Self-Driving Cars - Lecture Notes

Introduction

Course: 6.S094 Deep Learning for Self-Driving Cars, part of a series on deep learning.
Resources:
- Website: deeplearning.mit.edu
- Videos, slides, GitHub repository with course materials.
Assignments: Emailed to registered students.
Contact: [email protected] for questions/comments.

Deep Learning Basics

Overview

Deep Learning (DL): Extracts useful patterns from data with minimal human effort.
Key Concept: Optimization of neural networks using libraries like Python and TensorFlow.
Challenge: Asking good questions and obtaining good data.

Historical Context

Digitization: Easy access to data in digital form.
Hardware: Advancements in CPUs, GPUs, TPUs enable large-scale execution of DL algorithms.
Community and Tools: Collaboration and tools like GitHub, TensorFlow, PyTorch speed up problem-solving.
Applications: Face recognition, scene understanding, NLP, medical diagnosis, autonomous vehicles, ads, recommender systems, games.

Key Developments in Neural Networks

Historical Milestones

1940s: Early neural networks concepts.
1950s: Perceptron implementation.
1970-80s: Backpropagation, restricted Boltzmann machines, RNNs.
1990s: CNNs, MNIST dataset, LSTM, bi-directional RNNs.
2006: Deep Learning rebranded with Deep Belief Nets.
2009: ImageNet dataset.
2012: AlexNet and major improvements in DL.
2014: GANs introduced, DeepFace for face recognition.
2016-17: AlphaGo and AlphaZero achievements.
2018: Year of NLP breakthroughs (e.g., Google Bert).

Tooling Evolution

60s-Current: Progress with tools from basic perceptrons to TensorFlow and PyTorch.
Importance: Tools minimize human effort needed to reach solutions.

Practical Applications and Challenges

Deep Learning in Real-world Applications

Humanoid Robotics: Limited DL use, mostly traditional methods.
Autonomous Vehicles: Predominantly non-DL methods except for perception.

Ethical Issues and Safety

AI Safety: Need for ethical considerations, human oversight.
Unexpected Consequences: Example of optimizing game algorithms resulting in unforeseen behaviors.

Theoretical Foundations of Deep Learning

Learning Representations

Core Idea: Higher-level abstractions from data representations for easier interpretation and classification.
Human-Driven Goal: Simplifying complex problems (Einstein's influence).
Compression and Simplicity: Finding simple yet effective representations (e.g., heliocentric model).

Removing Human Input

Reduced Human Role: DL automates feature extraction, reducing need for expert input.
Limitations & Trade-offs: Balancing excitement and realism (Gartner Hype Cycle).

Technical Aspects of Neural Networks

Fundamental Unit: Neuron

Basic Structure: Input weights, bias, activation function, output.
Comparison with Biological Neurons: Simplicity of artificial neurons vs. biological complexity.
Efficiency Issues: Power consumption and learning differences.

Network Architecture

Layers: Input, hidden, output; stacked neurons form deep networks.
Universal Approximation: Single hidden layer can approximate any function.
Parallelizability: Efficient execution on GPUs and TPUs.

Training Neural Networks

Activation Functions: Key to non-linearity and learning (e.g., ReLU, sigmoid).
Loss Functions: MSE for regression, cross-entropy for classification.
Backpropagation: Adjusts weights based on error gradients.
Optimization Algorithms: Variants like SGD, momentum-based optimize learning.

Regularization Techniques

Overfitting: Regularization prevents over-memorization.
Validation & Early Stopping: Monitor performance on validation set to avoid overfitting.
Dropout: Randomly removes neurons during training.
Normalization: Input and batch normalization to stabilize learning.

Deep Learning Techniques and Models

Convolutional Neural Networks (CNNs)

Spatial Invariance: Uses filters to detect features regardless of their position.
Key Architectures: AlexNet, ResNet, GoogLeNet, SENet.

Object Detection and Semantic Segmentation

Object Detection: Identifies and classifies objects within an image (e.g., Faster R-CNN, SSD, YOLO).
Semantic Segmentation: Pixel-level classification of images.

Transfer Learning

Concept: Fine-tuning pre-trained models on new datasets.
Use Cases: Specialized tasks like pedestrian detection.

Autoencoders and Representations

Autoencoders: Use bottleneck architecture to compress data into meaningful representations.
Embeddings: Efficient representations for large datasets.

Generative Adversarial Networks (GANs)

Method: Generator and discriminator networks compete to produce realistic data.
Applications: Image generation, video consistency, high-resolution image creation.

Natural Language Processing (NLP)

Word Embeddings: Word2Vec for meaningful word representations.
Recurrent Neural Networks (RNNs): Handle sequence data, capture temporal dependencies.
LSTMs: Manage long-term dependencies in sequential data.
Attention Mechanisms: Improve context understanding in sequence data.

Automated Machine Learning (AutoML)

Neural Architecture Search: Automates discovery of effective neural network architectures.
Application: Streamlines DL processes by minimizing human intervention.

Deep Reinforcement Learning

Concept: Agents learn optimal actions through rewards in environments.
Applications: Robotics, video gaming, autonomous systems.

Conclusion

Goal: Progressing from theory to practical applications in DL, emphasizing ethical considerations.
Resources: All materials available at deeplearning.mit.edu.

Thank you for attending!