ML//Inference//Sampling//top-k

- Only consider the k most probable next tokens, zero out the rest.


Only consider the k most probable next tokens, zero out the rest.

k=1 is greedy, k=50 is common. Cuts the long tail of unlikely tokens.