Skip to main content
All terms
Data

Document Chunking

Splitting long documents into shorter segments that can be embedded and retrieved.

Definition

Document chunking divides long texts into shorter segments that fit within a model's context window and can be individually embedded and indexed for retrieval. Strategies range from fixed-size splitting to sentence- or paragraph-based splitting to semantic chunking, which groups topically coherent passages. Chunk size and overlap are key settings, since they shape retrieval precision and recall in retrieval-augmented generation systems.