All terms
Inference & Serving
Iteration-Level Scheduling
Reassessing batch membership every forward pass so finished requests free their slots.
Definition
Iteration-level scheduling is the mechanism behind continuous batching: at every processing step it rechecks which requests are in the active group, dropping finished ones right away and letting new requests take their place. This fine-grained control keeps the GPU (the chip that runs the model) busy and stops short requests from getting stuck behind long ones, a problem that plainer batching can cause when requests differ in length. vLLM and SGLang both implement it.