Self-supervised Learning for Speech Enhancement

Yu-Che Wang; Shrikant Venkataramani; Paris Smaragdis

Self-supervised Learning for Speech Enhancement

Yu-Che Wang, Shrikant Venkataramani, Paris Smaragdis

11 Jun 2020 (modified: 04 May 2025)Submitted to SAS 2020Readers: Everyone

Abstract: Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target. To relax the conditions on the training data, we consider the task of training speech enhancement networks in a self-supervised manner. We first use a limited training set of clean speech sounds and learn a latent representation by autoencoding on their magnitude spectrograms. We then autoencode on speech mixtures recorded in noisy environments and train the resulting autoencoder to share a latent representation with the clean examples. We show that using this training schema, we can now map noisy speech to its clean version using a network that is autonomously trainable without requiring labeled training examples or human intervention.

Double Submission: No

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/self-supervised-learning-for-speech/code)

4 Replies

Loading