Convolutional vs. Recurrent Neural Networks for Audio Source SeparationDownload PDF

12 Feb 2018 (modified: 05 May 2023)ICLR 2018 Workshop SubmissionReaders: Everyone
Abstract: We propose a convolutional neural network as an alternative to recurrent neural networks for separating out individual speakers in a sound mixture. Our results achieve state-of-the-art results with an order of magnitude fewer parameters. We also characterize the robustness of both models to generalize to three different testing conditions including a novel dataset. We create a new dataset RealTalkLibri which evaluates how well source separation models generalize to real world mixtures. Our results indicate the acoustics of the environment have significant impact on the performance of all neural network models, with the convolutional model showing superior ability to generalize to new environments.
TL;DR: Compared to the traditionally used recurrent neural networks, convolutional neural networks show robust performance in audio source separation. We create a new dataset which evaluates how well source separation models generalize to real world mixtures.
Keywords: convolutional neural networks, speech, source seperation
4 Replies

Loading