Abstract: As a neurological disability that affects muscles involved in articulation, dysarthria is a speech impairment that leads to reduced speech intelligibility. In severe cases, these individuals could also be handicapped and unable to interact with digital devices. For such individuals, Automatic Speech Recognition (ASR) technologies could be life changing by enabling them to communicate with others as well as computing devices via voice commands. Nonetheless, ASR systems designed to recognize healthy speech have shown very poor performance to transcribe dysarthric speech, signaling the need to design ASR specifically tailored for dysarthria. Dysarthric Speech Recognition (DRS) research has progressed gradually because of the challenges the research community faces such as the scarcity of dysarthric speech that does not allow the researchers to design deeper acoustic models needed to better learn dysarthric speech variations. In this paper we report on our preliminary findings to improve our previous DSR called Speech Vision and study the effects of Separable Convolutional neurons to improve its acoustic model. Speech Vision is a novel Dysarthric Speech Recognition system that learns to recognize the shape of the words uttered by dysarthric speakers instead of recognizing phone sequences and then mapping them to words. Experiments conducted on the utterances provided by all UA-Speech dysarthric speakers indicate the proposed Depthwise separable architecture provided better word recognition accuracies compared to the original Speech Vision’s architecture across all dysarthric speech intelligibility classes.
Loading