Abstract: With the recent developments of novel neural network designs, the state-of-the-art of music source separation systems has been significantly advanced. For example, one of the recently-proposed models, the band-split RNN (BSRNN), proposed to split the input spectrogram into fine-grained subband features and perform interleaved sequence-level and band-level modeling, and achieved superior performance on the MUSDB18 dataset. However, the original BSRNN required different band-split schemes and networks for different instruments, which greatly increased the cost for model training and inference. Moreover, it was designed for single-channel scenario, while music signals are typically stereo. In this paper, we extend BSRNN to single-input-multi-output (SIMO) and stereo mode where all tracks are jointly extracted with a same network that supports stereo signal modeling. Experiment results show that the SIMO stereo BSRNN can effectively improve the overall performance of all tracks.
Loading