ML//neural network//feature

2026-03-05

A direction in activation space that represents a concept: "Michael Jordan", "basketball", "plural", "code syntax".

A direction in activation space that represents a concept: "Michael Jordan", "basketball", "plural", "code syntax".

MLP layers detect and add features: if the current embedding aligns with a feature direction above threshold (ReLU), that feature gets added.

Example: if a vector already points toward "Michael" and "Jordan", the MLP adds the "basketball" direction.

Features can be distributed across neurons (superposition) or clean (monosemantic)

Sparse autoencoders try to extract the clean feature set from superposed activations.

Features are directions, not neurons. A single neuron might participate in many features.