Safety & Alignment

Privacy Attack

An attempt to extract private or sensitive information from or through a model.

Definition

A privacy attack is an attempt to reveal private or sensitive information through a model, such as recovering memorized training examples, inferring whether a record was in the training set, or reconstructing input features. Common forms include training data extraction, membership inference, and model inversion. Defenses include differential privacy, deduplication, and filtering sensitive content out of training data.

Privacy Attack

Definition

Related terms