All terms
Inference & Serving
Multi-Token Prediction
Predicting several future tokens in one step instead of one at a time.
Definition
Multi-token prediction trains or modifies a model to output several future tokens (word-pieces) in a single step rather than one at a time. Paired with a checking step, as in speculative decoding, it raises words generated per second while preserving output quality. The extra predictions act as drafts that the model confirms in bulk, which is most useful on long outputs.