ML//neural network//activation function

- The non-linearity between layers.


The non-linearity between layers.

Without it, stacking linear layers collapses to a single linear transform — can't learn curves.

ReLU replaced sigmoid as the default, then GELU took over for transformers.