ML//Inference//extended thinking//tree search

2026-03-06

Beyond linear thinking: explore multiple reasoning branches, evaluate them, choose the best.

Beyond linear thinking: explore multiple reasoning branches, evaluate them, choose the best.

Linear thinking: token1 → token2 → token3 → answer. One path, no backtracking.

Tree search: generate N branches → evaluate each → select the best → continue from there. Like a human "considering several options".

o3 likely uses this: not just more thinking tokens but active search among multiple reasoning chains. This is why o3 is expensive: exploring branches multiplies compute.

Evaluation of branches can use: PRM (score each step), self-evaluation (RLAIF), voting (generate N answers, pick most frequent), or external verification (for math/code)

Self-consistency bias problem: the model tends to prefer branches consistent with its own pretraining biases, not necessarily the most correct ones. It's judging its own work with its own blind spots.

Tree of Thought was the prompt-engineering precursor. Tree search in reasoning models is the trained, scaled, RL-optimized version.