All terms
Training
Warmup
Ramping the learning rate up gradually at the start of training.
Definition
Warmup gradually raises the learning rate from a very small value to its target over the first stretch of training. This prevents large, destabilizing updates early on, when gradients are noisy and the model is far from a good solution. It is especially important for adaptive optimizers like Adam and is usually followed by a decay schedule such as cosine annealing.