Optimization

Quantization-Aware Training

Simulating low precision during training so the model adapts to it.

Definition

Quantization-aware training inserts fake quantization operations during training so the model learns to compensate for the precision loss it will face at inference time. This yields noticeably better accuracy after quantizing — especially to INT8 or INT4 — than quantizing a model trained only in full precision. It costs more than post-training quantization but is common when deploying to edge devices or cost-sensitive serving.

Quantization-Aware Training

Definition

Related terms