Skip to main content
All terms
Training

Warmup

Ramping the learning rate up gradually at the start of training.

Definition

Warmup gradually raises the learning rate from a very small value to its target over the first stretch of training. This prevents large, destabilizing updates early on, when gradients are noisy and the model is far from a good solution. It is especially important for adaptive optimizers like Adam and is usually followed by a decay schedule such as cosine annealing.