Skip to main content
All terms
Data

Training Data

The examples a model learns from while its weights are adjusted.

Definition

Training data is the body of examples — text, images, audio, code, and more — that a model learns from while its weights are adjusted, kept separate from the validation and test data used to measure performance. Its scale, quality, diversity, and curation largely determine a model's capabilities and limitations. Foundation models are trained on web-scale collections and synthetic generation.