ML//Inference//Sampling//temperature

Scales logits before softmax: logits / T.


Scales logits before softmax: logits / T.

T=0: greedy (always pick highest probability). T=1: raw distribution. T>1: more random.

Makes softmax less acute: high T flattens the distribution (more creative), low T sharpens it (more confident)

The simplest generation knob. Low temperature for factual tasks, high for creative ones.

The inputs to softmax are called logits — raw pre-normalization scores over the vocabulary