ML//Inference//extended thinking//tree search
Beyond linear thinking: explore multiple reasoning branches, evaluate them, choose the best.
Beyond linear thinking: explore multiple reasoning branches, evaluate them, choose the best.
Linear thinking: token1 → token2 → token3 → answer. One path, no backtracking.
Tree search: generate N branches → evaluate each → select the best → continue from there. Like a human "considering several options".
o3 likely uses this — not just more thinking tokens but active search among multiple reasoning chains. This is why o3 is expensive: exploring branches multiplies compute.
Evaluation of branches can use: PRM (score each step), self-evaluation (RLAIF), voting (generate N answers, pick most frequent), or external verification (for math/code)
Self-consistency bias problem: the model tends to prefer branches consistent with its own pretraining biases, not necessarily the most correct ones — it's judging its own work with its own blind spots.
Tree of Thought was the prompt-engineering precursor. Tree search in reasoning models is the trained, scaled, RL-optimized version.