Skip to main content
All terms
Inference & Serving

Top-p Sampling

Sampling from the smallest set of tokens whose probabilities add up to a threshold p.

Definition

Top-p sampling, also called nucleus sampling, sorts tokens by probability and keeps only the smallest group whose cumulative probability reaches a threshold p, such as 0.9, then samples from that nucleus. Unlike top-k, which uses a fixed count, it adapts the candidate pool to the model's confidence — drawing on many tokens when uncertain and few when sure. It is commonly paired with temperature.