ML//Inference//speculative decoding

2023-11-10

- Use a small draft model to predict several tokens ahead, verify with the large model in one forward pass.

Use a small draft model to predict several tokens ahead, verify with the large model in one forward pass.

If the draft is right (common for easy tokens like "the", "is"), you get multiple tokens per step.

Free speedup with zero quality loss: the large model's distribution is preserved exactly.