Improving Automated Speech Recognition Using Retrieval-Based Voice Conversion

Published: 19 Mar 2024, Last Modified: 31 May 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Voice Conversion Techniques; Automatic Speech Recognition (ASR); Non-Native English Speakers; OpenAI Whisper Models; Word Error Rates (WER)
Abstract: This study examines the efficacy of voice conversion techniques in enhancing Automatic Speech Recognition (ASR) accuracy for non-native English speakers. Utilizing the OpenAI Whisper models, we analyzed transcription accuracy across various accents and countries. Significant reductions in Word Error Rates (WER) were observed, with the Whisper Large-v2 model showing the most pronounced improvements. Our findings indicate that advanced voice conversion can mitigate accent bias, promoting inclusivity and broadening the applicability of ASR technology to a more diverse user base.
Submission Number: 103
Loading