All terms
Optimization
GGUF
A file format for packaging quantized models to run locally on CPUs and consumer GPUs.
Definition
GGUF is a model file format associated with the llama.cpp project that packages quantized weights and metadata for efficient local inference, including on CPUs and consumer GPUs. It bundles everything needed to load and run a model in one file, and is the common distribution format for running open models on personal hardware. Tools like Ollama and LM Studio read it directly.