ML//benchmark//MMLU
- Massive Multitask Language Understanding: 57 subjects from elementary math to professional law.
Massive Multitask Language Understanding: 57 subjects from elementary math to professional law.
Multiple choice. GPT-4: ~86%, approaching human expert level.
Heavy contamination concerns — some questions appear verbatim in training data.