All terms
Inference & Serving
Guided Decoding
Masking invalid tokens during generation to force output into a target format.
Definition
Guided decoding steers generation toward a desired structure by checking the model's raw scores for each possible next word and blocking any choice that would break the target format. A simple rule-tracker keeps count of which words are allowed at the current position, guaranteeing correctly formed output such as well-formed JSON. Frameworks like Outlines and SGLang implement it efficiently.