All terms
Optimization
Model Compression
Shrinking a trained model so it runs faster and on smaller hardware.
Definition
Model compression reduces the size and resource needs of a trained model so it can run efficiently on servers or edge devices. It combines techniques such as quantization, pruning, knowledge distillation, and weight clustering to cut a model's footprint, often substantially. The result lowers deployment cost, improves response times, and makes offline, on-device AI possible.