ML//SSM//Mamba
- Selective State Spaces (Gu & Dao, 2023)
Selective State Spaces (Gu & Dao, 2023)
Key innovation: make state transitions input-dependent (selective), not fixed.
Matches transformer quality on language benchmarks at smaller scale.
But hasn't dethroned transformers at frontier scale — hybrid architectures (Mamba layers + attention layers) seem to be the pragmatic path. Interesting dead end? Or early innings?