Skip to main content
All terms
Frameworks & Tools

llama.cpp

An open-source C/C++ project for running language models on everyday hardware.

Definition

llama.cpp is an open-source C and C++ project for running large language models efficiently on everyday hardware, including CPUs and consumer GPUs. It relies heavily on quantization to shrink models so they fit in limited memory, and it introduced the GGUF model format now used widely for local inference. It powers many local-AI tools and desktop applications.