A Closer Look at Wav2vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Self-supervised learned models have been found to be very effective for tasks such as automatic speech recognition, speaker identification, and others. However, their utility in speech enhancement systems is yet to be firmly established, and perhaps slightly misunderstood. In this paper, we investigate the uses of SSL representations for single-channel speech enhancement in challenging conditions and establish the impact they can have on the enhancement task. Our constraints are designed around on-device real-time speech enhancement – model being causal, and the compute footprint being small. Additionally, we focus on low SNR conditions where such models struggle to provide good performance.
Loading