ML//Inference//Sampling

Choosing the next token from the probability distribution over the vocabulary.


Choosing the next token from the probability distribution over the vocabulary.

Temperature: scales logits before softmax — low = deterministic, high = creative.

Top-k: only consider the k highest-probability tokens.

Top-p (nucleus): only consider tokens whose cumulative probability exceeds p.

Introduces stochasticity — same input can produce different outputs (new manifold path)

Temperature is cranked up when generating candidate outputs for RLHF or DPO — need diversity.