ML//model//GPT//GPT-3//few-shot learning
Give the model a few examples in the prompt, and it generalizes to new inputs.
Give the model a few examples in the prompt, and it generalizes to new inputs.
No gradient update, no fine-tuning — the model "learned to learn" during pre-training
Didn't work reliably below ~10B parameters. Scale was the key.
Mechanistically: works partly because induction heads read the examples and copy their patterns to the new input — the attention heads that implement pattern-matching are doing the heavy lifting.
Part of the broader in-context learning phenomenon: the model adapts behavior from context alone.