ML//neural network//optimizer//Adam

- Adaptive Moment Estimation: momentum + per-parameter learning rates.


Adaptive Moment Estimation: momentum + per-parameter learning rates.

Tracks first moment (mean) and second moment (variance) of gradients.

Almost everyone uses Adam or AdamW (with weight decay). It just works.