All terms
Data
Data Augmentation
Expanding a dataset by transforming existing examples to add diversity and reduce overfitting.
Definition
Data augmentation increases the effective size and diversity of a training set by creating modified versions of existing examples. In vision this includes random crops, flips, and color shifts; in text it includes synonym replacement, back-translation, and paraphrasing. For language models it often means generating synthetic variations with the model itself or a teacher model. Augmentation helps models generalize and reduces overfitting when labeled data is scarce.