ML//benchmark//HumanEval
- 164 Python programming problems with unit tests (OpenAI)
164 Python programming problems with unit tests (OpenAI)
Measures pass@k: probability of getting at least one correct solution in k attempts.
More objective than multiple choice — code either passes the tests or it doesn't.