All terms
Safety & Alignment
Model Unlearning
Removing or reducing the influence of selected training data on a trained model.
Definition
Model unlearning is the process of removing or reducing the effect of selected training data on a model after it has already been trained, without retraining from scratch. It is used to honor data deletion requests, strip out copyrighted or private content, or erase harmful capabilities. Verifying that the information is truly gone, rather than merely suppressed, remains a hard open problem.