Convolutional vs. Recurrent Neural Networks for Audio Source Separation

Shariq Mobin*, Brian Cheung*, Bruno Olshausen

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: We propose a convolutional neural network as an alternative to recurrent neural networks for separating out individual speakers in a sound mixture. Our results achieve state-of-the-art results with an order of magnitude fewer parameters. We also characterize the robustness of both models to generalize to three different testing conditions including a novel dataset. We create a new dataset RealTalkLibri which evaluates how well source separation models generalize to real world mixtures. Our results indicate the acoustics of the environment have significant impact on the performance of all neural network models, with the convolutional model showing superior ability to generalize to new environments.
  • Keywords: convolutional neural networks, speech, source seperation
  • TL;DR: Compared to the traditionally used recurrent neural networks, convolutional neural networks show robust performance in audio source separation. We create a new dataset which evaluates how well source separation models generalize to real world mixtures.

Loading