📊

State Space Models and Mamba: A New Architecture for Language Models

Jul 18, 2024

State Space Models and Mamba

Introduction

  • Presenter: Louis Sano from Sano Academy
  • Topic: State space models and Mamba
  • Revolutionary: New architecture enhancing large language models
  • Origin: Introduced by Goo and Dao

Combining RNNs and CNNs

  • It merges concepts from recurrent neural networks (RNNs) and convolutional neural networks (CNNs)
  • Aimed at generating language efficiently

Example: Race Car

Variables and Measures

  • Input (u): Maintenance (e.g., topping off fluid, general maintenance)
  • State (h): Vehicle health (e.g., gas level, oil level, tire condition, motor condition)
  • Output (y): Performance (e.g., speed)

Functions/Matrices

  • A: Describes the natural wear and tear of the car (state transition matrix)
  • B: Effects of maintenance on the car's state (control matrix)
  • C: Performance based on the car's state (observation matrix)
  • D: Direct effect of maintenance on performance (usually neglected in language models)

Equations

  • State Transition: HT = A * HT-1 + B * XT
  • Output: YT = C * HT + D * XT

Numerical Example

  • Gas: 0.9 of previous day
  • Oil: 0.95 of previous day
  • Tires: 0.8 of previous day
  • Motor: 0.85 of previous day
  • State Transition Matrix: Accounts for mutual influences like good tires saving gas
  • Control Matrix: Influence of input on state
  • Observation Matrix: Converts state into performance
  • Direct Action Matrix: Influence of maintenance on performance

Language Generation

Variables for Language Models

  • Context (State): Big vector encapsulating discussion details
  • Last Word (Input): Last word in context (e.g.,