Data Augmentation for Dysarthric Speech Recognition Based on Text-to-Speech SynthesisDownload PDFOpen Website

2022 (modified: 03 Nov 2022)LifeTech 2022Readers: Everyone
Abstract: In the field of automatic speech recognition (ASR) for people with dysarthria, it is problematic that not enough training speech data can be collected from people with dysarthria. To solve this problem, we propose a method of data augmentation using text-to-speech (TTS) synthesis. In the proposed data augmentation method, a deep neural network (DNN)-based TTS model is trained by utilizing speech data recorded from a speaker with dysarthria, and the trained TTS model is then used to generate the speaker’s speech data for training the ASR model for the speaker. The results of a speech recognition experiment on a person having spinal muscular atrophy (SMA) showed that the speech recognition error rate was improved by using the proposed data augmentation.
0 Replies

Loading