ML//scaling laws//Chinchilla
- DeepMind (Hoffmann et al., 2022): most LLMs were undertrained relative to their size.
DeepMind (Hoffmann et al., 2022): most LLMs were undertrained relative to their size.
Given a fixed compute budget, use more data with fewer parameters than previously thought.
Chinchilla 70B matched Gopher 280B by training on 4× more tokens.
Shifted the industry from "bigger model" to "more data, right-sized model".