ML//neural network//optimizer

How to update weights after computing gradients.


How to update weights after computing gradients.

SGD: multiply gradient by learning rate, subtract from weight. Simple but slow.

Momentum: accumulate past gradients, like a ball rolling downhill.

Adam: adaptive learning rates per parameter — the default choice.