ML//benchmark//HumanEval

2023-07-12

- 164 Python programming problems with unit tests (OpenAI)

164 Python programming problems with unit tests (OpenAI)

Measures pass@k: probability of getting at least one correct solution in k attempts.

More objective than multiple choice: code either passes the tests or it doesn't.