Skip to main content
All terms
Data

Evaluation Contamination

When test material leaks into training data, making evaluation scores misleadingly high.

Definition

Evaluation contamination occurs when material from a test set leaks into a model's training data, so the model has effectively seen the answers and its scores no longer reflect true ability. It often happens when benchmark questions appear in large web-scraped corpora. Detecting and removing this overlap, through deduplication and contamination checks, is needed to keep evaluations trustworthy and comparisons fair.