All terms
Inference & Serving
Constrained Decoding
Restricting which tokens a model can emit so output fits a schema or grammar.
Definition
Constrained decoding restricts which tokens (the word-pieces a model emits) it may generate at each step so the output conforms to a required format, such as a fixed shape or pattern. By blocking any token that would break that format, it guarantees structurally valid results without relying on the model to follow instructions perfectly. It is how serving systems reliably produce valid JSON or other structured output.