Abstract: Music source separation aims to disentangle individual sources from the mixture of musical signals. Existing generative adversarial network (GAN) based methods generally work on the spectrogram domain only. However, this practice ignores the patterns from the waveform domain, which are more informative for modelling some categories of sources. In this paper, we propose a fully hybrid GAN framework to integrate knowledge from both domains. In particular, the generator formulates acoustical patterns from waveform and spectrogram domains, while the discriminator provides discriminative information based on the local patch-level spectrograms such that the generator can produce more plausible separation results. Furthermore, to enhance the quality of estimated sources, we devise a perceptual spectrogram loss term, which is a complement of the waveform-level loss. The proposed method is evaluated on two widely used music source separation datasets, producing music sources of high signal-to-distortion ratio (12.03 in MIR-1K dataset and 8.08 in MUSDB18 dataset). These results demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
External IDs:dblp:journals/mta/WuDHW25
Loading