All terms
Inference & Serving
TTFT
How long after a request before the model emits its first output token.
Definition
Time-to-first-token measures the latency between sending a request and receiving the first generated token. It is dominated by the prefill phase (processing the prompt) and is a key user-experience metric for streaming applications. Prefix caching and chunked prefill are common ways to reduce it.