All terms
Safety & Alignment
Guardrails
Safety controls that filter inputs and outputs to keep model behavior within policy.
Definition
Guardrails are the checks placed around a model at inference time to keep its behavior within policy. They can act on inputs (blocking harmful prompts), outputs (filtering or rewriting unsafe responses), or both, and range from keyword rules to classifier models and dedicated libraries. Operating across the prompt, model, and application layers, they complement training-time alignment with runtime enforcement.