Optimization

GPTQ

A method that shrinks a trained model by storing its weights with fewer digits, with little accuracy loss.

Definition

GPTQ is a quantization technique — it shrinks a model by storing each of its weights with fewer digits, such as 4 bits each — applied in one pass after training while keeping accuracy high. It works layer by layer, using a math method that estimates and corrects the error each rounding step introduces. This lets even very large models be shrunk in a few hours on a single GPU, and it is widely supported for cheaper deployment.

GPTQ

Definition

Related terms