ML//neural network//activation function
- The non-linearity between layers.
The non-linearity between layers.
Without it, stacking linear layers collapses to a single linear transform — can't learn curves.
ReLU replaced sigmoid as the default, then GELU took over for transformers.