Evaluation

Pairwise Comparison

Judging outputs by directly comparing two candidates and picking the better one.

Definition

Pairwise comparison evaluates outputs by showing two candidate responses to the same prompt and asking a judge — human or model — which is better. Comparing relative quality is often easier and more reliable than assigning an absolute score to each response alone. Aggregating many such comparisons yields metrics like win rate and Elo-style ratings used in preference data and leaderboards.

Pairwise Comparison

Definition

Related terms