Skip to main content
All terms
Optimization

Model Compression

Shrinking a trained model so it runs faster and on smaller hardware.

Definition

Model compression reduces the size and resource needs of a trained model so it can run efficiently on servers or edge devices. It combines techniques such as quantization, pruning, knowledge distillation, and weight clustering to cut a model's footprint, often substantially. The result lowers deployment cost, improves response times, and makes offline, on-device AI possible.