All terms
Safety & Alignment
Scalable Oversight
Supervision methods that keep working as AI systems grow more capable than their reviewers.
Definition
Scalable oversight is the problem of supervising and evaluating AI systems whose outputs become too complex or capable for unaided humans to check directly. Proposed approaches use AI assistance to help humans review work, such as having models debate, critique, or decompose a task into verifiable parts. It is closely tied to superalignment and to keeping training signals reliable as models improve.