All terms
Evaluation
Benchmark Contamination
When test questions leak into training data, inflating scores through memorization.
Definition
Benchmark contamination occurs when benchmark questions or answers appear in a model's training data, so high scores reflect memorization rather than genuine ability. It undermines fair comparison, since a contaminated model can look stronger than it is. The problem motivates fresh, private, or frequently rotated test sets and careful checks of the text a model was trained on.