Skip to main content
All terms
Inference & Serving

Microbatching

Splitting a batch into smaller sub-batches for scheduling or memory reasons.

Definition

Microbatching splits a batch of work into smaller sub-batches that are processed in sequence. It is used to fit work within available memory, to overlap computation with communication across devices, and to give schedulers finer control. In pipeline-parallel training and serving it keeps multiple stages busy at once, reducing idle time on the hardware.