All terms
Inference & Serving
Microbatching
Splitting a batch into smaller sub-batches for scheduling or memory reasons.
Definition
Microbatching splits a batch of work into smaller sub-batches that are processed in sequence. It is used to fit work within available memory, to overlap computation with communication across devices, and to give schedulers finer control. In pipeline-parallel training and serving it keeps multiple stages busy at once, reducing idle time on the hardware.