Skip to main content
All terms
Multimodal

Omni Model

A single model that takes in and produces several modalities at once.

Definition

An omni model is a multimodal model built to take in and produce several modalities—such as text, images, and audio—within one system rather than chaining separate components together. By handling inputs and outputs end to end, it aims for low-latency, real-time interaction across modalities, for example holding a spoken conversation while also reading and describing images.