Skip to main content
All terms
Inference & Serving

Guided Decoding

Masking invalid tokens during generation to force output into a target format.

Definition

Guided decoding steers generation toward a desired structure by checking the model's raw scores for each possible next word and blocking any choice that would break the target format. A simple rule-tracker keeps count of which words are allowed at the current position, guaranteeing correctly formed output such as well-formed JSON. Frameworks like Outlines and SGLang implement it efficiently.