ML//neural network//optimizer//Adam
- Adaptive Moment Estimation: momentum + per-parameter learning rates.
Adaptive Moment Estimation: momentum + per-parameter learning rates.
Tracks first moment (mean) and second moment (variance) of gradients.
Almost everyone uses Adam or AdamW (with weight decay). It just works.