All terms
Inference & Serving
Suffix Decoding
A model-free speculative method that copies continuations from matching earlier text.
Definition
Suffix decoding is a lightweight technique that needs no extra model: it looks for places where the ending of the text being written matches text seen earlier in the prompt or output. When it finds a match, the system copies or extends that continuation as a guess instead of running the full model, then checks the guess. It is especially effective for repetitive or structured generation such as code, data formats like JSON, or repeated tool-using loops.