📊

State Space Models and Mamba: A New Architecture for Language Models

Jul 18, 2024

State Space Models and Mamba

Introduction

Presenter: Louis Sano from Sano Academy
Topic: State space models and Mamba
Revolutionary: New architecture enhancing large language models
Origin: Introduced by Goo and Dao

Combining RNNs and CNNs

It merges concepts from recurrent neural networks (RNNs) and convolutional neural networks (CNNs)
Aimed at generating language efficiently

Example: Race Car

Variables and Measures

Input (u): Maintenance (e.g., topping off fluid, general maintenance)
State (h): Vehicle health (e.g., gas level, oil level, tire condition, motor condition)
Output (y): Performance (e.g., speed)

Functions/Matrices

A: Describes the natural wear and tear of the car (state transition matrix)
B: Effects of maintenance on the car's state (control matrix)
C: Performance based on the car's state (observation matrix)
D: Direct effect of maintenance on performance (usually neglected in language models)

Equations

State Transition: HT = A * HT-1 + B * XT
Output: YT = C * HT + D * XT

Numerical Example

Gas: 0.9 of previous day
Oil: 0.95 of previous day
Tires: 0.8 of previous day
Motor: 0.85 of previous day
State Transition Matrix: Accounts for mutual influences like good tires saving gas
Control Matrix: Influence of input on state
Observation Matrix: Converts state into performance
Direct Action Matrix: Influence of maintenance on performance

Language Generation

Variables for Language Models

Context (State): Big vector encapsulating discussion details
Last Word (Input): Last word in context (e.g.,

Full transcript