Abstract: Speech enhancement aims to separate the speech and noise components of noisy speech. It has greatly benefited from the convolutional encoder-decoder architecture. And designing more accurate and efficient convolutional kernels which mainly focus on speech or noise features can help improve the enhancement performance. To achieve this goal, we introduce a selective kernel convolution for better feature extraction through an adaptive receptive field size, and design a novel loss function which provides intermediate supervision to compel the kernels concentrate on either clean or noise components. The enhancement results show that the proposed model outperforms other competitors in both seen and unseen conditions. We also demonstrate the effectiveness of the proposed mechanisms through qualitative experimental results.
Loading