All terms
Inference & Serving
Dynamic Batching
Forming batches from requests as they arrive instead of waiting for a fixed count.
Definition
Dynamic batching collects incoming requests over a short window and groups them into a batch without waiting for a fixed count. This makes better use of the GPU (the chip that runs the model) than static batching when request rates vary, reducing idle time. It differs from continuous batching in that each batch is still run as one whole unit rather than swapping requests in and out word by word. Model servers such as Triton support it natively.