ML//Inference//Sampling

2026-02-15

Choosing the next token from the probability distribution over the vocabulary.

Choosing the next token from the probability distribution over the vocabulary.

Temperature: scales logits before softmax. Low = deterministic, high = creative.

Top-k: only consider the k highest-probability tokens.

Top-p (nucleus): only consider tokens whose cumulative probability exceeds p.

Introduces stochasticity: same input can produce different outputs (new manifold path)

Temperature is cranked up when generating candidate outputs for RLHF or DPO. Need diversity.