ML//neural network//superposition
Every neuron moonlights — representing dozens of concepts at once, because the model has more ideas than it has neurons to store them in. Neurons represent many different features simultaneously through linear combination — angles between feature directions are not perfectly perpendicular.
Every neuron moonlights — representing dozens of concepts at once, because the model has more ideas than it has neurons to store them in. Neurons represent many different features simultaneously through linear combination — angles between feature directions are not perfectly perpendicular.
The model has far more concepts to represent than it has neurons — superposition is the compression trick.
The more involved a neuron is across features, the more superposition it exhibits.
Makes mechanistic interpretability extremely hard — you can't point at one neuron and say "this means X".
The directions in activation space aren't random — they encode structured relationships, but they overlap and interfere.