All terms
Frameworks & Tools
Text Generation Inference
Hugging Face's open-source server for deploying language models in production.
Definition
Text Generation Inference is Hugging Face's open-source server for deploying large language models in production. It includes continuous batching, optimized attention kernels, and quantization support to serve open models efficiently on GPUs. It is one of several engines used to turn a downloaded model into a fast, scalable inference endpoint.