ML//model//GPT//o3

2026-03-05

OpenAI's second-generation reasoning model (December 2024). Scales extended thinking further than o1

OpenAI's second-generation reasoning model (December 2024). Scales extended thinking further than o1

Configurable compute budgets: low/medium/high thinking time. More thinking = better accuracy = higher cost.

Likely uses tree search: not just linear thinking but active exploration among multiple reasoning branches: evaluate and select the best path. This explains both the accuracy leap and the high cost.

Branch evaluation options: PRM (score each step), self-evaluation (RLAIF), voting (N answers, pick most frequent), external verification (math/code)

Achieved unprecedented scores on ARC-AGI, the benchmark designed to resist memorization and test genuine reasoning.

DeepSeek R1 matched o1-level with open weights and GRPO, but o3 pushed beyond, suggesting tree search or PRMs add value over group-relative scoring.