Selective Kernel Network with Intermediate Supervision Loss for Monaural Speech Enhancement

Published: 01 Jan 2021, Last Modified: 03 Aug 2024ICCT 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Speech enhancement aims to separate the speech and noise components of noisy speech. It has greatly benefited from the convolutional encoder-decoder architecture. And designing more accurate and efficient convolutional kernels which mainly focus on speech or noise features can help improve the enhancement performance. To achieve this goal, we introduce a selective kernel convolution for better feature extraction through an adaptive receptive field size, and design a novel loss function which provides intermediate supervision to compel the kernels concentrate on either clean or noise components. The enhancement results show that the proposed model outperforms other competitors in both seen and unseen conditions. We also demonstrate the effectiveness of the proposed mechanisms through qualitative experimental results.
Loading