All terms
Optimization
Quantization-Aware Training
Simulating low precision during training so the model adapts to it.
Definition
Quantization-aware training inserts fake quantization operations during training so the model learns to compensate for the precision loss it will face at inference time. This yields noticeably better accuracy after quantizing — especially to INT8 or INT4 — than quantizing a model trained only in full precision. It costs more than post-training quantization but is common when deploying to edge devices or cost-sensitive serving.