Abstract: Individuals with cerebral palsy (CP) and amyotrophic lateral sclerosis (ALS) frequently face challenges with articulation, leading to dysarthria and resulting in atypical speech patterns. In healthcare settings, communication breakdowns reduce the quality of care. While building a mobile application to enable fluid communication, it was found that state-of-the-art (SOTA) automatic speech recognition (ASR) technology like Whisper and Wav2vec2.0 marginalizes atypical speakers largely due to the lack of training data. This work aims to leverage SOTA ASR followed by domain-specific error correction. English dysarthric ASR performance is often evaluated on the TORGO dataset. Prompt overlap is a well-known issue with this dataset where phrases overlap between training and test speakers. An algorithm is proposed to break this prompt-overlap. After reducing prompt overlap, results with SOTA ASR models produce extremely high word error rates for speakers with mild and severe dysarthria. Furthermore, to improve ASR, the impact of n-gram language models and large-language model (LLM) based multi-modal generative error correction algorithms like Whispering-LLaMA for a second-pass ASR is examined. This work highlights how much more needs to be done to improve ASR for atypical speakers to enable equitable healthcare access both in-person and in e-health settings.
External IDs:dblp:conf/icc/HuiZM25
Loading