Understanding Sequence Modeling in Neural Networks

Sep 19, 2024

Lecture 2: 6S191 - Sequence Modeling Problems

Introduction

  • Lecturer: Ava
  • Focus: Sequence modeling problems in neural networks
  • Background: Previous lecture covered deep learning, essentials of neural networks, and training using gradient descent.

Sequence Modeling

  • Definition: Problems involving sequential data processing.
  • Examples:
    • Predicting the next position of a moving ball using its past positions.
    • Audio waveform can be split into sequences of sound waves.
    • Text can be seen as a sequence of characters or words.
    • Applications include medical readings (EKGs), stock price changes, biological sequences (DNA, protein).

Neural Networks and Sequence Modeling

  • Concepts: Build models to handle sequential data using neural networks.
  • Previous Example: Binary classification problem.
  • Current Focus: Sequence modeling:
    • Sequence to Single Outcome: Sentiment analysis on a sentence.
    • Single Input to Sequence Output: Image captioning.
    • Sequence to Sequence: Language translation.

Neural Network Design for Sequence Problems

  • Initial Model (Perceptron):
    • Single input mapped to single output.
    • Does not account for time or sequence.
  • Time Step Computation:
    • Apply neural network repeatedly over time steps.
    • Problem: Lack of linkage between time steps (missing inherent data structure).

Incorporating Sequence Dependency

  • Recurrent Cell Concept:
    • Introduce state H to store information from previous time steps.
    • State is updated and propagated time step by time step.
    • Basis for Recurrent Neural Networks (RNNs).

RNNs (Recurrent Neural Networks)

  • Core Idea: Use state H to model sequence dependency.
  • Equation: Update state using weight matrix and input.
  • Implementation:
    • Initialize hidden state and input.
    • Iterate through sequence to update state and predict output.
    • Use backpropagation through time for training.

Challenges and Techniques in RNNs

  • Vanishing/Exploding Gradients:
    • Repeated computations can lead to very large/small gradients.
    • Mitigation: Smart initialization, careful choice of activation functions, gating mechanisms.
  • Long Short-Term Memory (LSTM):
    • A type of RNN with gating to handle long-term dependencies.
    • Uses cell state and multiple activation functions.

Applications of RNNs

  • Music Generation:
    • Use RNNs to compose new music based on training data.
  • Sequence Classification:
    • Assign sentiment to a sentence.

Limitations of RNNs

  • Memory Bottleneck: Constrains ability to encode long sequences.
  • Sequential Processing: Computationally intensive.
  • Vanishing Gradients: Reduces capacity to model long-term dependencies.

Attention Mechanism

  • Concept: Focus on important input parts, capturing dependencies over a sequence.
  • Components: Query, Key, Value used to compute attention scores.
  • Self-Attention in Transformers:
    • Attend to input parts to extract important features.
    • Positional encoding to maintain order.
    • Basis for models like GPT and BERT.

Applications Beyond RNNs

  • Transformers:
    • Parallelize attention heads for complex input feature extraction.
    • Used in natural language processing, biology, computer vision.

Conclusion

  • Overview of sequence modeling, RNNs, attention mechanisms, and transformers.
  • Importance of attention and its role in modern deep learning architectures.