All terms
Multimodal
Speech to Speech
Converting spoken input directly into spoken output.
Definition
Speech to speech converts spoken input into spoken output, covering tasks like real-time speech translation and voice conversion. End-to-end systems map audio to audio directly, rather than transcribing to text, translating, and re-synthesizing in separate stages, which can preserve tone and reduce delay. It underpins voice assistants and live interpretation that respond by speaking back.