All terms
Multimodal
Multimodal RAG
Retrieval-augmented generation that draws on text, images, and other media.
Definition
Multimodal RAG extends retrieval-augmented generation across more than one modality, retrieving relevant text, images, or other media and feeding them to a model as context. It often relies on multimodal embeddings so a query in one form can match content in another. This grounds answers in source material that includes diagrams, photos, or charts, not just text.