Skip to main content
All terms
Multimodal

Multimodal RAG

Retrieval-augmented generation that draws on text, images, and other media.

Definition

Multimodal RAG extends retrieval-augmented generation across more than one modality, retrieving relevant text, images, or other media and feeding them to a model as context. It often relies on multimodal embeddings so a query in one form can match content in another. This grounds answers in source material that includes diagrams, photos, or charts, not just text.