Evaluation

Humanity's Last Exam

A very hard benchmark of expert-level questions for testing frontier AI.

Definition

Humanity's Last Exam is a deliberately difficult benchmark of expert-level questions spanning many fields, built to test the most capable AI systems. It aims to measure broad reasoning and knowledge in a range where older benchmarks no longer separate the top models well.

Related terms

Benchmark Evaluation Frontier Model Reasoning Model