Self-Supervised Learning and Multi-Task Pre-Training Based Single-Channel Acoustic Denoising

Yi Li, Yang Sun, Syed Mohsen Naqvi

Published: 2022, Last Modified: 31 Jul 2025MFI 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In self-supervised learning-based single-channel speech denoising problem, it is challenging to reduce the gap between the denoising performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve the speech denoising performance within self-supervised learning. In the proposed pre-training autoencoder (PAE), only a very limited set of unpaired and unseen clean speech signals are required to learn speech latent representations. Meanwhile, to solve the limitation of existing single pre-task, the proposed masking module exploits the dereverberated mask and estimated ratio mask to denoise the mixture as the new pre-task. The downstream task autoencoder (DAE) utilizes unlabeled and unseen reverberant mixtures to generate the estimated mixtures. The DAE is trained to share a latent representation with the clean examples from the learned representation in the PAE. Experimental results on a benchmark dataset demonstrate that the proposed method outperforms the state-of-the-art approaches.

External IDs:dblp:conf/mfi/LiSN22