All terms
Safety & Alignment
Refusal
When a model declines a request that violates its policies, ideally with a clear reason.
Definition
A refusal is when a model declines to answer or act on a request that conflicts with its safety policies, ideally explaining why rather than failing silently. Calibrating refusals is a balance: the model should reject genuinely harmful requests without over-refusing benign ones that merely resemble them. Tuning this trade-off is an active area of alignment and evaluation work.