All terms
Inference & Serving
Decoding Strategy
The rule for choosing each next token from a model's predicted distribution.
Definition
A decoding strategy is the rule that selects each next token (word-piece) from the list of likelihoods a model predicts, shaping how creative, coherent, or predictable the output is. Greedy decoding always takes the most likely option, while sampling methods like top-k and top-p draw at random from a restricted pool of high-probability choices. Temperature and beam search offer further control over diversity and structure.