DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement

Published: 21 Sept 2023, Last Modified: 11 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: speech enhancement, diffusion models, adaptive prior, dropout, generalization
TL;DR: We identify that prevailing diffusion enhancement models are susceptible to condition collapse, and we propose an efficient method to address this issue.
Abstract: Speech enhancement (SE) aims to improve the intelligibility and quality of speech in the presence of non-stationary additive noise. Deterministic deep learning models have traditionally been used for SE, but recent studies have shown that generative approaches, such as denoising diffusion probabilistic models (DDPMs), can also be effective. However, incorporating condition information into DDPMs for SE remains a challenge. We propose a model-agnostic method called DOSE that employs two efficient condition-augmentation techniques to address this challenge, based on two key insights: (1) We force the model to prioritize the condition factor when generating samples by training it with dropout operation; (2) We inject the condition information into the sampling process by providing an informative adaptive prior. Experiments demonstrate that our approach yields substantial improvements in high-quality and stable speech generation, consistency with the condition factor, and inference efficiency. Codes are publicly available at https://github.com/ICDM-UESTC/DOSE.
Submission Number: 10495
Loading