Skip to main content
All terms
Evaluation

BIG-bench

A large, collaborative benchmark of hundreds of diverse and difficult tasks.

Definition

BIG-bench is a collaborative benchmark containing hundreds of diverse tasks contributed by many researchers. It was built to probe capabilities that may appear only at larger model scales and to track progress on problems that remain hard for AI systems. A curated subset, BIG-bench Hard, focuses on the tasks where models still struggle most.