ML//neural network//superposition

Every neuron moonlights — representing dozens of concepts at once, because the model has more ideas than it has neurons to store them in. Neurons represent many different features simultaneously through linear combination — angles between feature directions are not perfectly perpendicular.


Every neuron moonlights — representing dozens of concepts at once, because the model has more ideas than it has neurons to store them in. Neurons represent many different features simultaneously through linear combination — angles between feature directions are not perfectly perpendicular.

The model has far more concepts to represent than it has neurons — superposition is the compression trick.

The more involved a neuron is across features, the more superposition it exhibits.

Makes mechanistic interpretability extremely hard — you can't point at one neuron and say "this means X".

The directions in activation space aren't random — they encode structured relationships, but they overlap and interfere.