Skip to main content
All terms
Safety & Alignment

System Prompt Leakage

Exposure of the hidden system instructions that are meant to stay private.

Definition

System prompt leakage is the exposure of the hidden system instructions that configure a model's behavior, persona, or constraints and are meant to stay private. Attackers coax a model into repeating these instructions, which can reveal proprietary prompt designs, internal rules, or clues for bypassing safeguards. It is often pursued through jailbreaks or prompt injection and motivates not relying on a secret system prompt alone for security.