Leveraging LEXICAL and GRAMMATICAL Errors: Extending ASR Error Measurements through NLP

ACL ARR 2024 June Submission3894 Authors

16 Jun 2024 (modified: 12 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper addresses the limitations of current Automatic Speech Recognition (ASR) evaluation metrics by highlighting the inadequacies of overall error rates, particularly Word Error Rate. While this offers a broad assessment, it lacks the granularity needed to discern specific linguistic categories affected by errors. We offer an NLP-driven metric based on parts of speech and grammatical categories, to provide a more in-depth analysis of ASR errors. Using the Whisper ASR system on English, Japanese, and Spanish, within the CommonVoice 15 dataset, we analyze GRAMMATICAL and LEXICAL error rates. Results show that GRAMMATICAL words trigger less errors than LEXICAL words across all languages, and Proper Nouns in Japanese combined with case markers are related to higher accuracy. By categorizing errors based on these linguistic attributes, our methodology aims to enhance the explanatory power of error analysis in ASR, contributing to a more precise evaluation of system performance based on NLP approaches.
Paper Type: Long
Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Research Area Keywords: automatic speech recognition, speech technologies, part-of-speech tagging, dependency parsing, morphologically-rich languages pos tagging
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models
Languages Studied: English, Japanese, Spanish
Submission Number: 3894
Loading