All terms
Optimization
Learning Rate Scheduler
A policy that changes the learning rate over the course of training.
Definition
A learning rate scheduler adjusts the learning rate as training proceeds, following a predefined or adaptive policy. Common patterns include a linear warmup followed by cosine decay, step decay, or polynomial decay. Proper scheduling — especially warmup paired with decay — is important for stable training of large transformers, helping to prevent early divergence and poor final performance.