All terms
Safety & Alignment
Model Inversion
Recovering training information by probing a model's outputs.
Definition
Model inversion is a privacy attack that attempts to reconstruct information about training data by analyzing a model's outputs and responses to crafted queries. In some cases it can recover sensitive attributes or approximate representations of individual records. It is a concern for models trained on private data, and defenses include differential privacy and limiting how much detail outputs expose.