Optimization

Post-Training Quantization

Quantizing an already-trained model to lower precision without retraining it.

Definition

Post-Training Quantization (PTQ) reduces the size and memory footprint of an already-trained network, typically converting 16-bit floating-point weights to 8-bit or 4-bit integers. It requires little computation and no retraining, which makes it convenient and fast. The tradeoff is that it can introduce some accuracy loss, especially for smaller models or under aggressive compression, where quantization-aware training may be preferred.

Post-Training Quantization

Definition

Related terms