All terms
Evaluation
BIG-bench
A large, collaborative benchmark of hundreds of diverse and difficult tasks.
Definition
BIG-bench is a collaborative benchmark containing hundreds of diverse tasks contributed by many researchers. It was built to probe capabilities that may appear only at larger model scales and to track progress on problems that remain hard for AI systems. A curated subset, BIG-bench Hard, focuses on the tasks where models still struggle most.