ML//SSM//Mamba

- Selective State Spaces (Gu & Dao, 2023)


Selective State Spaces (Gu & Dao, 2023)

Key innovation: make state transitions input-dependent (selective), not fixed.

Matches transformer quality on language benchmarks at smaller scale.

But hasn't dethroned transformers at frontier scale — hybrid architectures (Mamba layers + attention layers) seem to be the pragmatic path. Interesting dead end? Or early innings?