Cycle-Consistent GAN Front-End to Improve ASR Robustness to Perturbed SpeechDownload PDF


Published: 16 Nov 2018, Last Modified: 05 May 2023NIPS 2018 Workshop IRASL Blind SubmissionReaders: Everyone
Abstract: Automatic Speech Recognition (ASR) systems, which perform well on regular speech, are found to be vulnerable to adversarial examples generated by small perturbations in the audio signal. Even naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade ASR performance. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) to reduce the perturbations, and hence add robustness to ASR performance. CycleGAN is trained using non-parallel examples of perturbed and normal speech. Experiments on spontaneously generated laughter-speech and creaky voice datasets tested with Google cloud ASR show absolute improvements in WER of 14.9% and 11%, respectively, on speech converted using the CycleGAN based front-end as compared to the original perturbed speech.
8 Replies