ML//model//GPT//o1

2026-03-05

OpenAI's first frontier reasoning model (September 2024), the model that proved extended thinking scales.

OpenAI's first frontier reasoning model (September 2024), the model that proved extended thinking scales.

Uses RL on chain of thought: trained to generate high-quality reasoning chains. Reward signal = is the final answer correct? Likely uses PRMs (score each reasoning step, not just the outcome)

Core insight: more test-time compute = better answers. Instead of making the model bigger, let it think longer.

The thinking is hidden from the user: the model generates internal reasoning tokens that are discarded before showing the response.

Dominated math, coding, and science benchmarks: outperformed GPT-4 on GPQA, MATH, and competition-level problems.

Overthinking weakness: empirically worse than GPT-4 on simple common-sense questions. Extended thinking on trivial problems triggers a distributional shift to the wrong basin

OpenAI didn't publish full technical details. But DeepSeek R1 published its pipeline (SFT + GRPO) and matched o1, suggesting the approach isn't magic.