All terms
Inference & Serving
Prompt Caching
Storing a processed prompt prefix so repeated requests skip recomputing it.
Definition
Prompt caching stores the processed state of a prompt prefix — such as a long system prompt or reference document — so repeated requests can skip recomputing it. It builds on prefix caching, reusing the cached KV state when later requests share the same opening tokens. Many providers expose it as a billing and latency optimization, since the cached portion costs less to serve.