Abstract: Speech denoising is the task of obtaining clean speech from the speech signal corrupted by background noise. Except in high end recording studios, we do not get clean speech signal as some background noise, or noise due to the recording device is always present. We propose an approach to denoise noisy speech signal by modeling the noise explicitly. Existing approaches model speech, potentially of multiple speakers, for denoising. Such approaches have an inherent drawback as a separate model is required for each speaker. We show that instead of modeling speaker(s), modelling the noise helps obtain a unified speaker independent denoiser, cf.\ speaker dependent ones in existing popular approaches. In addition to a novel speech denoising network, we also propose a large scale noise dataset, \texttt{AudioNoiseSet}, derived from Audioset dataset, to train our model. We show that our model outperforms prior approaches by significant margin in a large scale, in the wild speech datasets, \ie AVspeech, with standard quantitative metrics. In addition we show with multiple human ratings that the method is preferred over state-of-the-art approaches. The user study also points towards limitations of the metrics used, which we discuss. We also provide many qualitative results to demonstrate our better results.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Supplementary Material: zip
7 Replies
Loading