All terms
Multimodal
Multimodal
AI that works with more than one kind of input or output, such as text, images, and audio.
Definition
Multimodal AI can take in or produce more than one kind of data — text, images, audio, or video — rather than just one. A multimodal model might describe a photo, answer questions about a chart, or generate an image from a sentence. Combining modalities makes AI more flexible and closer to how people perceive the world.