Skip to main content
All terms
Foundations

Self-Supervised Learning

Learning from labels the model creates automatically from the data itself.

Definition

Self-supervised learning creates its own training signal from unlabeled data, for example by hiding parts of the input and predicting them. For text, this is usually next-token or masked-token prediction; for images, methods often match representations of augmented views. This approach enables pretraining on vast unlabeled corpora and is the foundation of modern large language models.