Skip to main content
All terms
Evaluation

Benchmark Contamination

When test questions leak into training data, inflating scores through memorization.

Definition

Benchmark contamination occurs when benchmark questions or answers appear in a model's training data, so high scores reflect memorization rather than genuine ability. It undermines fair comparison, since a contaminated model can look stronger than it is. The problem motivates fresh, private, or frequently rotated test sets and careful checks of the text a model was trained on.