ML//benchmark//MATH

2026-03-08

- Competition-level math problems from AMC and AIME contests.

Competition-level math problems from AMC and AIME contests.

Frontier variants: MATH level 5 (hardest subset), FrontierMath (unsolved research-level problems)

The benchmark that proved chain of thought and reasoning models actually help: dramatic accuracy jumps.