ML//SSM//Mamba

2023-11-10

- Selective State Spaces (Gu & Dao, 2023)

Selective State Spaces (Gu & Dao, 2023)

Key innovation: make state transitions input-dependent (selective), not fixed.

Matches transformer quality on language benchmarks at smaller scale.

But hasn't dethroned transformers at frontier scale. Hybrid architectures (Mamba layers + attention layers) seem to be the pragmatic path. Interesting dead end? Or early innings?