ML//neural network//optimizer
How to update weights after computing gradients.
How to update weights after computing gradients.
SGD: multiply gradient by learning rate, subtract from weight. Simple but slow.
Momentum: accumulate past gradients, like a ball rolling downhill.
Adam: adaptive learning rates per parameter — the default choice.