ML//model//GPT//GPT-3//few-shot learning

Give the model a few examples in the prompt, and it generalizes to new inputs.


Give the model a few examples in the prompt, and it generalizes to new inputs.

No gradient update, no fine-tuning — the model "learned to learn" during pre-training

Didn't work reliably below ~10B parameters. Scale was the key.

Mechanistically: works partly because induction heads read the examples and copy their patterns to the new input — the attention heads that implement pattern-matching are doing the heavy lifting.

Part of the broader in-context learning phenomenon: the model adapts behavior from context alone.