Skip to main content
All terms
Evaluation

GPQA

A benchmark of hard, expert-written science questions resistant to simple web search.

Definition

GPQA is a benchmark of difficult, expert-written multiple-choice science questions in physics, chemistry, and biology. The questions are designed to be hard for non-experts even with web access, so high scores indicate genuine reasoning rather than lookup. It is used to probe advanced scientific reasoning in strong models as easier benchmarks become saturated.