Skip to main content
All terms
Data

Data Curation

Choosing, filtering, cleaning, and balancing training data to raise its quality.

Definition

Data curation is the work of choosing, filtering, cleaning, and balancing training data to raise its quality. It includes removing low-quality, duplicated, toxic, or private content and balancing domains and languages. Careful curation — not just adding raw volume — is now seen as one of the most powerful levers for improving a model's performance and safety.