Evaluation

Pass@k

The probability that at least one of k sampled solutions to a problem is correct.

Definition

Pass@k measures the probability that at least one of k separate attempts at a problem is correct, estimated fairly by drawing from a larger pool of attempts. It suits tasks like code generation, where a model can try several times and any working solution counts. Pass@1 reflects how often a single attempt succeeds, while larger k reveals how much extra tries improve the chance of success.

Pass@k

Definition

Related terms