Skip to main content
All terms
Patterns

Long-Context Compression

Shrinking large inputs to fit a context window while keeping what matters.

Definition

Long-context compression covers techniques for fitting large amounts of information into a finite context window. Approaches include summarizing sections into condensed text, retrieving only the relevant chunks, reusing the key-value cache (the model's saved short-term memory of the text so far), and learned models that distill documents into compact tokens (the small word-pieces a model reads). Compression trades completeness for feasibility, so it must be designed to preserve the details the downstream task actually needs.