Skip to main content
All terms
Prompting

Prompt Compression

Shortening a prompt before inference to cut cost and latency.

Definition

Prompt compression cuts cost and waiting time by shrinking the input before it reaches the model. Approaches include hard compression that drops repeated words or sentences, soft compression that distills text into a compact numeric form the model can read, and selective inclusion that keeps only the most relevant fetched passages. Tools such as LLMLingua rank which words matter most to compress long inputs aggressively with little loss in accuracy.