ML//benchmark//MMLU

- Massive Multitask Language Understanding: 57 subjects from elementary math to professional law.


Massive Multitask Language Understanding: 57 subjects from elementary math to professional law.

Multiple choice. GPT-4: ~86%, approaching human expert level.

Heavy contamination concerns — some questions appear verbatim in training data.