All terms
Optimization
Post-Training Quantization
Quantizing an already-trained model to lower precision without retraining it.
Definition
Post-Training Quantization (PTQ) reduces the size and memory footprint of an already-trained network, typically converting 16-bit floating-point weights to 8-bit or 4-bit integers. It requires little computation and no retraining, which makes it convenient and fast. The tradeoff is that it can introduce some accuracy loss, especially for smaller models or under aggressive compression, where quantization-aware training may be preferred.