ML//Inference//Sampling//temperature

2020-10-15

Scales logits before softmax: logits / T.

Scales logits before softmax: logits / T.

T=0: greedy (always pick highest probability). T=1: raw distribution. T>1: more random.

Makes softmax less acute: high T flattens the distribution (more creative), low T sharpens it (more confident)

The simplest generation knob. Low temperature for factual tasks, high for creative ones.

The inputs to softmax are called logits, raw pre-normalization scores over the vocabulary