Skip to main content
All terms
Inference & Serving

Constrained Decoding

Restricting which tokens a model can emit so output fits a schema or grammar.

Definition

Constrained decoding restricts which tokens (the word-pieces a model emits) it may generate at each step so the output conforms to a required format, such as a fixed shape or pattern. By blocking any token that would break that format, it guarantees structurally valid results without relying on the model to follow instructions perfectly. It is how serving systems reliably produce valid JSON or other structured output.