Skip to main content
All terms
Frameworks & Tools

Text Generation Inference

Hugging Face's open-source server for deploying language models in production.

Definition

Text Generation Inference is Hugging Face's open-source server for deploying large language models in production. It includes continuous batching, optimized attention kernels, and quantization support to serve open models efficiently on GPUs. It is one of several engines used to turn a downloaded model into a fast, scalable inference endpoint.