Skip to main content
All terms
Safety & Alignment

Model Stealing

Copying a model's behavior, weights, or capabilities without authorization.

Definition

Model stealing is an attempt to copy a model's behavior, weights, or capabilities without authorization, often by repeatedly querying a deployed system and training a substitute on its outputs. It threatens the intellectual property and competitive advantage of model providers and can also expose private training data. Defenses include rate limiting, output watermarking, and detecting extraction-like query patterns.