Skip to main content
All terms
Architectures

Speech-Language Model

A model that joins speech understanding or generation with language ability.

Definition

A speech-language model combines the ability to understand or produce speech with language modeling in a single system. Rather than chaining a separate transcriber, text model, and synthesizer, it processes audio and text together so spoken input can flow more directly into reasoning and spoken output. This integration supports tasks like spoken dialogue, voice assistants, and speech-to-speech translation.