Inference & Serving

TTFT

How long after a request before the model emits its first output token.

Definition

Time-to-first-token measures the latency between sending a request and receiving the first generated token. It is dominated by the prefill phase (processing the prompt) and is a key user-experience metric for streaming applications. Prefix caching and chunked prefill are common ways to reduce it.

TTFT

Definition

Related terms