Safety & Alignment

Constitutional AI

An alignment method where a model critiques and revises its own answers against written principles.

Definition

Constitutional AI is an alignment approach from Anthropic in which a model critiques and revises its own responses against an explicit written set of principles (a constitution), then trains on the improved outputs. By generating its own feedback rather than relying on people to label every harmful case (an approach called RLAIF), making alignment more transparent, customizable, and scalable.

Constitutional AI

Definition

Related terms