ML//model//GPT//GPT-3//few-shot learning

2020-10-15

Give the model a few examples in the prompt, and it generalizes to new inputs.

Give the model a few examples in the prompt, and it generalizes to new inputs.

No gradient update, no fine-tuning: the model "learned to learn" during pre-training

Didn't work reliably below ~10B parameters. Scale was the key.

Mechanistically: works partly because induction heads read the examples and copy their patterns to the new input. The attention heads that implement pattern-matching are doing the heavy lifting.

Part of the broader in-context learning phenomenon: the model adapts behavior from context alone.