Generative Adversarial Networks Based Data Augmentation for Noise Robust Speech Recognition

Hu Hu, Tian Tan, Yanmin Qian

Published: 2018, Last Modified: 14 May 2025ICASSP 2018EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Data augmentation is an effective method to increase the size of training data and reduce the mismatch between training and testing for noise robust speech recognition. Different from the traditional approaches by directly adding noise to the original waveform, in this work we utilize generative adversarial networks (GAN) for data generation to improve speech recognition under noise conditions. With this method, the generated speech samples are based on spectrum feature level and produced frame by frame without dependence among them, and the augmented data has no true labels. Then to effectively use these untranscribed augmented data, an unsupervised learning framework is designed for acoustic modeling. The proposed GAN-based data augmentation approach is evaluated on Aurora4. The experimental results show that a relative ~ 7.0% WER reduction can be obtained by the proposed approach upon an advanced acoustic model.