Skip to main content
All terms
Safety & Alignment

Superalignment

The challenge of aligning AI systems that may exceed human ability to supervise them.

Definition

Superalignment refers to the challenge of supervising and aligning AI systems that may eventually surpass human ability, where ordinary human oversight could be too slow or unreliable to catch mistakes. Proposed approaches include using AI to help evaluate and oversee other AI, and developing oversight methods that scale with capability. It treats present-day alignment techniques as possibly insufficient for far more capable future systems.