Skip to main content
All terms
Inference & Serving

Greedy Decoding

Always picking the single highest-probability token at each generation step.

Definition

Greedy decoding is the simplest decoding strategy (rule for choosing each next word): at every step the model picks the single highest-scoring token, or word-piece, giving the same output every time. It is fast and predictable but can be repetitive or get stuck, because the best choice in the moment can lead to a worse sentence overall. Sampling methods and beam search are common alternatives that trade some speed for diversity or quality.