ML//model//GPT//GPT-3//zero-shot learning

2026-03-01

Solving a task with no examples, just an instruction. "Translate this to French: Hello" with zero demonstrations. The model relies entirely on its pre-training knowledge to understand the task format.

Solving a task with no examples, just an instruction. "Translate this to French: Hello" with zero demonstrations. The model relies entirely on its pre-training knowledge to understand the task format.

GPT-3 revealed the capability hierarchy: zero-shot < few-shot < fine-tuned. But the gap between zero-shot and few-shot is often smaller than expected, especially for well-understood tasks.

Why it works: during pre-training on billions of tokens, the model encountered countless examples of translation, summarization, Q&A in natural format. Zero-shot just activates those learned patterns via the right prompt.

CLIP is zero-shot by design: it maps images and text into the same latent space, so it can classify any image by checking cosine similarity against arbitrary text labels it has never been trained on.

The practical bridge to prompt engineering: since zero-shot performance depends entirely on how you phrase the instruction, prompt engineering exists to close the gap between zero-shot and few-shot without needing examples.

Emergent behavior often manifests as zero-shot capabilities: models below a certain scale can't do a task at all (even with examples), then suddenly larger models do it with zero examples.