All terms
Foundations
Self-Supervised Learning
Learning from labels the model creates automatically from the data itself.
Definition
Self-supervised learning creates its own training signal from unlabeled data, for example by hiding parts of the input and predicting them. For text, this is usually next-token or masked-token prediction; for images, methods often match representations of augmented views. This approach enables pretraining on vast unlabeled corpora and is the foundation of modern large language models.