UNISE: Unified Noise-Invariant Learning for Speech Enhancement toward Improved Content Preservation

Seungu Han; Sungho Lee; Eungbeom Kim; Seaone Ok; Kyogu Lee

UNISE: Unified Noise-Invariant Learning for Speech Enhancement toward Improved Content Preservation

Seungu Han, Sungho Lee, Eungbeom Kim, Seaone Ok, Kyogu Lee

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Speech enhancement, semantic information, self-supervised learing, noise robustness

TL;DR: A unified framework noise-invariant representation learning and generative speech enhancement for improved content preservation

Abstract: The importance of semantic information in speech enhancement (SE) has recently been emphasized to improve intelligibility, whereas earlier work primarily focused solely on acoustic perceptual quality. To address this, recent approaches leverage pre-trained self-supervised representations, which have shown strong performance on {discriminative} tasks. However, such representations are less effective for {generative} tasks and, since they are typically trained only on clean data, struggle to fully preserve content under noisy or distorted conditions. In this work, we aim to bridge this gap by introducing a unified generative SE model, called \textbf{UNISE}, that incorporates noise-invariant representation learning. By jointly learning an encoder using noise-invariant clustering and a generative decoder, our model produces robust speech representations well suited for the SE task. As a result, UNISE achieves improved linguistic content preservation while maintaining competitive perceptual quality\footnote{Audio samples are available at: \url{https://tinyurl.com/UNISE-ICLR2026}}.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 23439

Loading