ML//benchmark//MATH
- Competition-level math problems from AMC and AIME contests.
Competition-level math problems from AMC and AIME contests.
Frontier variants: MATH level 5 (hardest subset), FrontierMath (unsolved research-level problems)
The benchmark that proved chain of thought and reasoning models actually help — dramatic accuracy jumps.