Skip to main content
All terms
Training

Gradient Clipping

Capping the size of each training adjustment before it is applied, to keep training stable.

Definition

Gradient clipping limits the size of the adjustment the model wants to make before applying it, usually by scaling the whole adjustment down when it grows past a set limit. This prevents the adjustments from blowing up, a common instability in deep networks that can derail training. It is a lightweight, nearly universal technique in large-model training, and differs from weight clipping, which limits the model's internal values directly.