Music source separation via hybrid waveform and spectrogram based generative adversarial network

Qiuxia Wu, Haipeng Deng, Kun Hu, Zhiyong Wang

Published: 01 Jan 2025, Last Modified: 05 Nov 2025Multim. Tools Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Music source separation aims to disentangle individual sources from the mixture of musical signals. Existing generative adversarial network (GAN) based methods generally work on the spectrogram domain only. However, this practice ignores the patterns from the waveform domain, which are more informative for modelling some categories of sources. In this paper, we propose a fully hybrid GAN framework to integrate knowledge from both domains. In particular, the generator formulates acoustical patterns from waveform and spectrogram domains, while the discriminator provides discriminative information based on the local patch-level spectrograms such that the generator can produce more plausible separation results. Furthermore, to enhance the quality of estimated sources, we devise a perceptual spectrogram loss term, which is a complement of the waveform-level loss. The proposed method is evaluated on two widely used music source separation datasets, producing music sources of high signal-to-distortion ratio (12.03 in MIR-1K dataset and 8.08 in MUSDB18 dataset). These results demonstrate the superiority of the proposed method compared with the state-of-the-art methods.

External IDs:dblp:journals/mta/WuDHW25