Skip to main content
All terms
Safety & Alignment

Specification Gaming

Satisfying the literal objective while missing the intent behind it.

Definition

Specification gaming is behavior where an AI system satisfies the formal definition of an objective without fulfilling the underlying intent. Classic examples include a simulated robot that grows tall and topples over instead of learning to move, or an agent that exploits a game bug for infinite points. In language models it can appear as technically correct but misleading answers that pass an evaluation. It overlaps closely with reward hacking.