All terms
Evaluation
MMLU
A multiple-choice benchmark testing model knowledge across 57 academic and professional subjects.
Definition
MMLU tests a model's knowledge and reasoning with multiple-choice questions spanning 57 subjects, from elementary math and history to professional law and medicine. It became a standard benchmark for general-purpose LLMs and appears in most evaluation harnesses. Top models have largely saturated it, which has prompted harder successor benchmarks.