Evaluating Speech To Text For Children With Speech Impairment

Published: 06 Mar 2025, Last Modified: 04 Apr 2025ICLR 2025 Workshop AI4CHL PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny paper
Keywords: Speech, Speech defects
Abstract: Speech-to-text (STT) technology has advanced significantly, yet it remains inadequate for individuals with pediatric speech impairments such as dysarthria, dysphonia, stuttering, neurological disorders like cerebral palsy, and those with hearing loss–induced articulation disorders. This study evaluates the performance of three widely used STT models—Whisper, wav2vec 2.0, and LibriSpeech-based systems—against these speech conditions using standard speech recognition metrics, including Word Error Rate (WER), Match Error Rate (MER), Word Information Lost (WIL), and Word Information Preserved (WIP). Results indicate thatall models exhibit substantial transcription errors when processing disordered pediatric speech. Whisper showed moderate success but frequently misclassified disordered speech as noise. wav2vec 2.0 demonstrated improved adaptability but struggled with irregular rhythm and prosody. LibriSpeechbased models, trained on fluent adult speech, performed the worst, rendering pediatric speech nearly unintelligible. These findings underscore the systemic exclusion of speech impairments in mainstream STT development. Future models must incorporate diverse pediatric datasets, improve adaptability to disordered speech, and prioritize accessibility in their design.
Submission Number: 32
Loading