All terms
Training
Gradient Clipping
Capping the size of each training adjustment before it is applied, to keep training stable.
Definition
Gradient clipping limits the size of the adjustment the model wants to make before applying it, usually by scaling the whole adjustment down when it grows past a set limit. This prevents the adjustments from blowing up, a common instability in deep networks that can derail training. It is a lightweight, nearly universal technique in large-model training, and differs from weight clipping, which limits the model's internal values directly.