Evaluation

ARC-AGI

A visual puzzle benchmark testing whether AI can infer a rule from a few examples.

Definition

ARC-AGI is a reasoning benchmark built from small visual puzzles. A model sees a few before-and-after examples, must work out the abstract rule behind them, and apply it to a new case. It targets flexible reasoning rather than memorized facts, and it has been notably hard for AI. (Distinct from the unrelated ARC science-question benchmark.)

ARC-AGI

Definition

Related terms