ML//model//GPT//o3
OpenAI's second-generation reasoning model (December 2024) — scales extended thinking further than o1
OpenAI's second-generation reasoning model (December 2024) — scales extended thinking further than o1
Configurable compute budgets: low/medium/high thinking time. More thinking = better accuracy = higher cost.
Likely uses tree search: not just linear thinking but active exploration among multiple reasoning branches — evaluate and select the best path. This explains both the accuracy leap and the high cost.
Branch evaluation options: PRM (score each step), self-evaluation (RLAIF), voting (N answers, pick most frequent), external verification (math/code)
Achieved unprecedented scores on ARC-AGI — the benchmark designed to resist memorization and test genuine reasoning.
DeepSeek R1 matched o1-level with open weights and GRPO — but o3 pushed beyond, suggesting tree search or PRMs add value over group-relative scoring.