All terms
Multimodal
Speech Recognition
Technology that converts spoken words into text; also called ASR.
Definition
Speech recognition is technology that converts spoken words into written text. Also called automatic speech recognition (ASR), it powers voice assistants, dictation, captions, and call transcription. Modern systems use neural networks and handle many languages, accents, and noisy audio far better than earlier approaches.