Using AI to Automate Phonetic Transcription and Perform Forced Alignment for Clinical Application in the Assessment of Speech Sound Disorders
Keywords: artificial intelligence, phonetic transcription, forced alignment, speech-to-text, speech sound disorders, children, clinical practice
TL;DR: We are using AI to automate phonetic transcription and perform forced alignment for clinical application in assessing children with speech sound disorders.
Abstract: Speech-language pathologists (S-LPs) routinely use phonetic transcription to profile and describe the characteristics of a child's speech in the assessment of speech sound disorders (SSDs). The literature identifies phonetic transcription as a demanding perceptual skill, with accuracy and reliability dependent on experience, available resources, and the nature of SSDs. Automatic speech recognition and segmentation techniques, which recognize, transcribe, and align audio file content, have been identified as a possible tool to improve the accuracy and efficiency of the auditory perceptual transcription undertaken by S-LPs. In this paper, we propose a model to automate phonetic transcriptions and perform forced alignment for childhood-disordered speech. Utilizing the state-of-the-art wav2vec 2.0 acoustic model and advanced post-processing algorithms, our model achieves a phoneme error rate of 0.15 and an $F_1$ Score of 82% on the UltraSuite dataset. These results suggest a level of accuracy greater than what has been reported for auditory-perceptual transcription in the clinical setting.
Submission Number: 33
Loading