ML//Inference//Sampling//top-p

- Nucleus sampling: include tokens until cumulative probability reaches p (e.g. 0.95)


Nucleus sampling: include tokens until cumulative probability reaches p (e.g. 0.95)

Adaptive — for confident predictions considers few tokens, for uncertain ones many.

Generally preferred over top-k because it adapts to distribution shape.