All terms
Safety & Alignment
System Prompt Leakage
Exposure of the hidden system instructions that are meant to stay private.
Definition
System prompt leakage is the exposure of the hidden system instructions that configure a model's behavior, persona, or constraints and are meant to stay private. Attackers coax a model into repeating these instructions, which can reveal proprietary prompt designs, internal rules, or clues for bypassing safeguards. It is often pursued through jailbreaks or prompt injection and motivates not relying on a secret system prompt alone for security.