Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Convolutional vs. Recurrent Neural Networks for Audio Source Separation
Shariq Mobin*, Brian Cheung*, Bruno Olshausen
Feb 12, 2018 (modified: Feb 12, 2018)ICLR 2018 Workshop Submissionreaders: everyone
Abstract:We propose a convolutional neural network as an alternative to recurrent neural networks for separating out individual speakers in a sound mixture. Our results achieve state-of-the-art results with an order of magnitude fewer parameters. We also characterize the robustness of both models to generalize to three different testing conditions including a novel dataset. We create a new dataset RealTalkLibri which evaluates how well source separation models generalize to real world mixtures. Our results indicate the acoustics of the environment have significant impact on the performance of all neural network models, with the convolutional model showing superior ability to generalize to new environments.
TL;DR:Compared to the traditionally used recurrent neural networks, convolutional neural networks show robust performance in audio source separation. We create a new dataset which evaluates how well source separation models generalize to real world mixtures.