Skip to main content
All terms
Safety & Alignment

Model Unlearning

Removing or reducing the influence of selected training data on a trained model.

Definition

Model unlearning is the process of removing or reducing the effect of selected training data on a model after it has already been trained, without retraining from scratch. It is used to honor data deletion requests, strip out copyrighted or private content, or erase harmful capabilities. Verifying that the information is truly gone, rather than merely suppressed, remains a hard open problem.