Sams-net: A sliced attention-based neural network for music source separation

Jiawei Chen, Tingle Li, Haowen Hou, ming li

Published: 23 Jan 2021, Last Modified: 25 Jan 20252021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)EveryoneCC BY-NC-SA 4.0

Abstract: Convolutional Neural Network (CNN) or Long Short-term Memory (LSTM) based models with the input of spectrogram or waveforms are commonly used for deep learning based audio source separation. In this paper, we propose a Sliced Attention-based neural network (Sams-Net) in the spectrogram domain for the music source separation task. It enables spectral feature interactions with multi-head attention mechanism, achieves easier parallel computing and has a larger receptive field com-pared with LSTMs and CNNs respectively. Experimental results on the MUSDB18 dataset show that the proposed method, with fewer parameters, outperforms most of the state-of-the-art DNN-based methods.