ML//Inference//Sampling
Choosing the next token from the probability distribution over the vocabulary.
Choosing the next token from the probability distribution over the vocabulary.
Temperature: scales logits before softmax — low = deterministic, high = creative.
Top-k: only consider the k highest-probability tokens.
Top-p (nucleus): only consider tokens whose cumulative probability exceeds p.
Introduces stochasticity — same input can produce different outputs (new manifold path)
Temperature is cranked up when generating candidate outputs for RLHF or DPO — need diversity.