All terms
Inference & Serving
Top-p Sampling
Sampling from the smallest set of tokens whose probabilities add up to a threshold p.
Definition
Top-p sampling, also called nucleus sampling, sorts tokens by probability and keeps only the smallest group whose cumulative probability reaches a threshold p, such as 0.9, then samples from that nucleus. Unlike top-k, which uses a fixed count, it adapts the candidate pool to the model's confidence — drawing on many tokens when uncertain and few when sure. It is commonly paired with temperature.