Skip to main content
All terms
Safety & Alignment

Jailbreak Taxonomy

A classification of the techniques people use to get a model to bypass its safety rules.

Definition

A jailbreak taxonomy is an organized classification of the ways people try to make a model ignore its safety training — grouped by technique, goal, and how they succeed. Mapping the landscape this way helps researchers understand the threats and build sturdier defenses.