ML//neural network//superposition

2026-03-06

Every neuron moonlights: representing dozens of concepts at once, because the model has more ideas than it has neurons to store them in. Neurons represent many different features simultaneously through linear combination: angles between feature directions are not perfectly perpendicular.

The model has far more concepts to represent than it has neurons: superposition is the compression trick.

The more involved a neuron is across features, the more superposition it exhibits.

Makes mechanistic interpretability extremely hard: you can't point at one neuron and say "this means X".

The directions in activation space aren't random: they encode structured relationships, but they overlap and interfere.