All terms
Data
Data Curation
Choosing, filtering, cleaning, and balancing training data to raise its quality.
Definition
Data curation is the work of choosing, filtering, cleaning, and balancing training data to raise its quality. It includes removing low-quality, duplicated, toxic, or private content and balancing domains and languages. Careful curation — not just adding raw volume — is now seen as one of the most powerful levers for improving a model's performance and safety.