All terms
Inference & Serving
SGLang
A fast LLM serving framework whose RadixAttention reuses shared prompt prefixes.
Definition
SGLang is a high-performance serving framework for LLMs and vision-language models, developed at UC Berkeley and hosted by LMSYS. Its signature feature is RadixAttention, which automatically spots and reuses the KV cache (the model's saved short-term memory of the text so far) for openings shared across many requests, with no manual setup. This is decisive for agent, multi-turn, and RAG (fetching documents to answer from) workloads where the same instructions or context repeat. It also batches and overlaps work efficiently and is widely used to run models during reinforcement learning training.