ML//Training//exposure bias
A student who only ever practiced with answer keys — then had to take the exam using its own messy drafts as reference. During SFT, the model always sees **correct** context (human-written training data). At inference, it sees its own previous outputs — which may contain errors.
A student who only ever practiced with answer keys — then had to take the exam using its own messy drafts as reference. During SFT, the model always sees correct context (human-written training data). At inference, it sees its own previous outputs — which may contain errors.
The model was never trained on its own mistakes as context. When it generates a wrong token, that error becomes context for the next token, pushing toward more errors — a distribution the model never saw during training.
Path dependency in the residual stream: once the model starts generating in a wrong direction, each incorrect token shifts the context vector further from familiar territory. It's an attractor — hard to escape.
This is why extended thinking has a dark side: more generated tokens = more opportunities for an error to compound into a cascade via path dependency.
Related to overthinking: longer reasoning chains give exposure bias more room to accumulate.
Mitigation approaches: scheduled sampling (mix model outputs into training), RLHF/DPO (train on model-generated outputs with reward signal), and tree search (explore multiple branches to escape bad paths)