All terms
Evaluation
Humanity's Last Exam
A very hard benchmark of expert-level questions for testing frontier AI.
Definition
Humanity's Last Exam is a deliberately difficult benchmark of expert-level questions spanning many fields, built to test the most capable AI systems. It aims to measure broad reasoning and knowledge in a range where older benchmarks no longer separate the top models well.