Deep Sequence Models and S4

Jul 1, 2024

Lecture on Deep Sequence Models and S4

Introduction

  • Discusses sequence-to-sequence networks in deep learning
  • Focus on Deep Sequence Models

Deep Sequence Models

  • Sequence Models: Maps input sequence to output sequence
  • Deep Sequence Model: Integrates simple sequence models into deep architectures (e.g., CNN, Transformer)

State Space Models

  • Definition: Model represented by differential equations
  • Applications: Statistical models from 1960s, now adapted for deep learning
  • Variants: Classical probabilistic vs. deterministic feature transformation
  • Focus: Efficient computation and meaningful feature extraction

S4 Model

  • Overview: Deep neural networks using state space models integrating the S4 model
  • Properties: Continuous models, recurrent models, computational efficiency
  • Focus on Innovations: Efficient computation and deep learning context

State Space Model (SSM)

Basic Structure

  • Equations: Defined through differential equations
  • Input: Function U(t)
  • Output: Function Y(t)
  • Role of A, B, C, D Matrices: Parameters for differential equations and transitions

Important Concepts

  • Latent State (X): Intermediate higher-dimensional representation
  • Output Projection: Linear combination of latent states
  • Function vs. Sequence: Continuous time equations mapping functions

Characteristics

  • Continuity: Benefits for irregularly sampled data and continuous processes (e.g., audio waveforms)
  • Recurrence: Mapping continuous models to discrete sequences, allowing efficient online/auto-regressive computation
  • Parallel Computation: Unrolling as convolutions for efficient parallel executions

Special SSM Models (S4)

Definition

  • Hippo Matrices: Special structures in A, B matrices for efficient operation
  • Motivation: Maintaining long-range historical context
  • Online Function Reconstruction: Captures and reconstructs input history online

Efficient Computation

  • Challenges: Higher dimensional latent states requiring large computations
  • Solutions: Structure imposition, diagonal approximations
  • Algorithm Simplification: Diagonal variants like S4D for efficiency

Applications

Speech Classification

  • Data: High-resolution continuous audio waveform classification
  • Results: Superior performance on sampled frequency and generalization

Auto-Regressive Generation

  • Example: Wavenet for speech generation
  • Benefits: Efficient sampling with finite state size and correct inductive bias

Time Series Prediction

  • Task: Predicting patient biometric measures from EKG data
  • Insight: Superior continuous signal modeling compared to Transformers

Practical Usage

Recommendations

  • Architectural Integration: Combine SSMs into deeper networks
  • Variations: Use diagonal variants (S4D) for practical implementations

Current Research

  • Ongoing exploration on improving recurrent view speed and flexibility

Conclusion

  • Summary: SSMs in deep learning bridge classical models with efficient sequence mapping and contemporary architectural integration
  • Future Work: Further simplifications and practical tutorials