All terms
Inference & Serving
Scheduler
The component that decides which requests run and in what order over time.
Definition
A scheduler is the component of a serving system that decides how work is assigned over time — which requests to admit, batch together, or defer. In language model inference it manages queues, groups requests for efficiency, and applies policies like iteration-level scheduling so new requests can join an in-flight batch. Good scheduling balances throughput against per-request latency under changing load.