DSF-ACodec: A Dual-Scale Spectra Fusion Based Asymmetric Neural Speech Codec

Jinxin Li, Hongxia Bie, Zhao Jing

Published: 2025, Last Modified: 07 Apr 2026DSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Although recent neural speech codecs have achieved high-fidelity speech reconstruction, there are still limitations: (1) the symmetric encoder-decoder architecture results in low computational efficiency; (2) downsampling layers in encoders usually cause sampling loss. In this paper, we propose a dual-scale spectra fusion based asymmetric neural speech codec named DSF-ACodec. It employs a one-branch encoder to encode high-resolution amplitude and phase spectra, while a powerful two-branch decoder reconstructs the high-resolution spectra in parallel. The decoded speech is then generated through the inverse short-time Fourier transform (ISTFT). Such an asymmetric architecture reduces the encoder parameters, effectively improving computational efficiency. Furthermore, we introduce a Spectra-based Skip Connection Module (SSCM), which fuses low-resolution amplitude and phase spectra with encoded high-resolution spectra features, successfully mitigating sampling loss. Experimental results demonstrate that DSF-ACodec achieves higher speech reconstruction quality compared to the baseline model, APCodec, while reducing the encoder parameters by approximately 28.6%.
Loading