Safety & Alignment

Control Problem

The challenge of keeping advanced AI systems under reliable human control.

Definition

The control problem is the challenge of keeping advanced AI systems under reliable human control, especially as they become more capable and autonomous. It concerns whether developers can correct, constrain, or shut down a system that pursues goals in unintended ways. It motivates research into corrigibility, scalable oversight, and alignment so that capable systems remain steerable and accountable.

Control Problem

Definition

Related terms