Skip to main content
All terms
Multimodal

Speech Recognition

Technology that converts spoken words into text; also called ASR.

Definition

Speech recognition is technology that converts spoken words into written text. Also called automatic speech recognition (ASR), it powers voice assistants, dictation, captions, and call transcription. Modern systems use neural networks and handle many languages, accents, and noisy audio far better than earlier approaches.