ML//model//GPT//o3

OpenAI's second-generation reasoning model (December 2024) — scales extended thinking further than o1


OpenAI's second-generation reasoning model (December 2024) — scales extended thinking further than o1

Configurable compute budgets: low/medium/high thinking time. More thinking = better accuracy = higher cost.

Likely uses tree search: not just linear thinking but active exploration among multiple reasoning branches — evaluate and select the best path. This explains both the accuracy leap and the high cost.

Branch evaluation options: PRM (score each step), self-evaluation (RLAIF), voting (N answers, pick most frequent), external verification (math/code)

Achieved unprecedented scores on ARC-AGI — the benchmark designed to resist memorization and test genuine reasoning.

DeepSeek R1 matched o1-level with open weights and GRPO — but o3 pushed beyond, suggesting tree search or PRMs add value over group-relative scoring.