All terms
Inference & Serving
Backpressure
Slowing or rejecting incoming requests when a system is overloaded.
Definition
Backpressure is a flow-control method that signals the systems sending requests to slow down when a system cannot keep up with incoming work. In a serving setup it surfaces as queuing, rejected requests, or throttling once the queue or the GPU's memory fills (the GPU being the chip that runs the model). Applying backpressure protects steady output and prevents a server from collapsing under load, at the cost of longer waits or dropped requests.