Understanding Sequence Modeling in Neural Networks

Sep 19, 2024

Lecture 2: 6S191 - Sequence Modeling Problems

Introduction

Lecturer: Ava
Focus: Sequence modeling problems in neural networks
Background: Previous lecture covered deep learning, essentials of neural networks, and training using gradient descent.

Sequence Modeling

Definition: Problems involving sequential data processing.
Examples:
- Predicting the next position of a moving ball using its past positions.
- Audio waveform can be split into sequences of sound waves.
- Text can be seen as a sequence of characters or words.
- Applications include medical readings (EKGs), stock price changes, biological sequences (DNA, protein).

Neural Networks and Sequence Modeling

Concepts: Build models to handle sequential data using neural networks.
Previous Example: Binary classification problem.
Current Focus: Sequence modeling:
- Sequence to Single Outcome: Sentiment analysis on a sentence.
- Single Input to Sequence Output: Image captioning.
- Sequence to Sequence: Language translation.

Neural Network Design for Sequence Problems

Initial Model (Perceptron):
- Single input mapped to single output.
- Does not account for time or sequence.
Time Step Computation:
- Apply neural network repeatedly over time steps.
- Problem: Lack of linkage between time steps (missing inherent data structure).

Incorporating Sequence Dependency

Recurrent Cell Concept:
- Introduce state H to store information from previous time steps.
- State is updated and propagated time step by time step.
- Basis for Recurrent Neural Networks (RNNs).

RNNs (Recurrent Neural Networks)

Core Idea: Use state H to model sequence dependency.
Equation: Update state using weight matrix and input.
Implementation:
- Initialize hidden state and input.
- Iterate through sequence to update state and predict output.
- Use backpropagation through time for training.

Challenges and Techniques in RNNs

Vanishing/Exploding Gradients:
- Repeated computations can lead to very large/small gradients.
- Mitigation: Smart initialization, careful choice of activation functions, gating mechanisms.
Long Short-Term Memory (LSTM):
- A type of RNN with gating to handle long-term dependencies.
- Uses cell state and multiple activation functions.

Applications of RNNs

Music Generation:
- Use RNNs to compose new music based on training data.
Sequence Classification:
- Assign sentiment to a sentence.

Limitations of RNNs

Memory Bottleneck: Constrains ability to encode long sequences.
Sequential Processing: Computationally intensive.
Vanishing Gradients: Reduces capacity to model long-term dependencies.

Attention Mechanism

Concept: Focus on important input parts, capturing dependencies over a sequence.
Components: Query, Key, Value used to compute attention scores.
Self-Attention in Transformers:
- Attend to input parts to extract important features.
- Positional encoding to maintain order.
- Basis for models like GPT and BERT.

Applications Beyond RNNs

Transformers:
- Parallelize attention heads for complex input feature extraction.
- Used in natural language processing, biology, computer vision.

Conclusion

Overview of sequence modeling, RNNs, attention mechanisms, and transformers.
Importance of attention and its role in modern deep learning architectures.

Full transcript