Dynamic Slimmable Network for Speech Separation

Published: 2024, Last Modified: 26 May 2026IEEE Signal Process. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Neural networks for speech separation generally exhibit high computational costs and large memory footprints. Moreover, typical separation networks have a fixed computational graph that processes all input frames at a uniform computational cost, even though intensive processing may not be necessary for frames containing silence or a single active speaker. Addressing this computational inefficiency is especially crucial when these networks are deployed on resource-constrained devices. In this letter, we propose a dynamic slimmable network for speech separation that mitigates the computational inefficiency of existing networks. We introduce slimmable layers with a gating mechanism that can adapt their computational complexity based on the input characteristics. As an example, we propose to use the slimmable layers in the intra-chunk blocks of a dual-path structure-based network to facilitate adaptation based on the local characteristics of the input signal. Experimental evaluation on simulated two-speaker mixtures from the WSJ0-2mix dataset demonstrates that the proposed method substantially reduces the computational cost while maintaining comparable performance to fully utilized static networks.
Loading