High-Fidelity GAN-based Vocoder with Conditioning Subband Network and Magnitude-aware Phase Loss

ICLR 2026 Conference Submission17842 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: subband condition network, magnitude-aware anti-wrapping phase loss, phase wrapping
Abstract: Recent developments of vocoders are primarily dominated by GAN-based networks targeting to high-quality waveform generation from mel-spectrogram representations. However, these methods typically operate in a black box, which results in a loss of inherent information existing in a mel-spectrogram. In this paper, we propose the SCNet, a GAN-based vocoder with Subband Condition Network to address these limitations. Specifically, SCNet takes a subband signal predicted by a condition network as prior knowledge. Then, the subband signal generates Fourier spectral coefficients by Short-Time Fourier transform (STFT), aiming to integrate into the GAN-based backbone network. Additionally, to avoid the phase wrapping issue, we propose a magnitude-aware anti-wrapping phase loss to compute the instantaneous phase errors between predicted and raw phase values. Meanwhile, the magnitude of raw signal is also incorporated into this loss to achieve more weight where the magnitude is larger. In our experiments, SCNet validates the effectiveness and achieves the superior performance for high quality waveform generation, both on subjective and objective metrics.The source code is available at https://anonymous.4open.science/r/SCNet-94D1.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17842
Loading