ML//neural network//activation function

2017-10-22

- The non-linearity between layers.

The non-linearity between layers.

Without it, stacking linear layers collapses to a single linear transform: can't learn curves.

ReLU replaced sigmoid as the default, then GELU took over for transformers.