All terms
Multimodal
Omni Model
A single model that takes in and produces several modalities at once.
Definition
An omni model is a multimodal model built to take in and produce several modalities—such as text, images, and audio—within one system rather than chaining separate components together. By handling inputs and outputs end to end, it aims for low-latency, real-time interaction across modalities, for example holding a spoken conversation while also reading and describing images.