Skip to main content
All terms
Optimization

Learning Rate Scheduler

A policy that changes the learning rate over the course of training.

Definition

A learning rate scheduler adjusts the learning rate as training proceeds, following a predefined or adaptive policy. Common patterns include a linear warmup followed by cosine decay, step decay, or polynomial decay. Proper scheduling — especially warmup paired with decay — is important for stable training of large transformers, helping to prevent early divergence and poor final performance.