ML//neural network//activation function//sigmoid

2026-07-04

Squashes any real number into (0, 1) via 1/(1+e^-x). The classic activation function before ReLU.

Squashes any real number into (0, 1) via 1/(1+e^-x). The classic activation function before ReLU.

The output reads naturally as a probability or a valve setting (0 = closed, 1 = open). That is why LSTM gates use it.

Saturates at both ends: for large magnitudes the gradient is near 0, which feeds the vanishing gradient problem in deep nets.

Still standard for the final unit of binary classification; mostly replaced by ReLU / GELU in hidden layers.