RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Denoising diffusion probabilistic model, prior distribution, posterior, speech enhancement, image restoration
TL;DR: This paper proposes to improve conditional denoising diffusion probabilistic models (DDPMs) by jointly learning a more informative prior distribution, instead of settling on pre-defined or handcrafted priors, for signal restoration applications.
Abstract:

Denoising diffusion probabilistic models (DDPMs) estimate the data distribution by sequentially denoising samples drawn from a prior distribution, which is typically assumed to be the standard Gaussian for simplicity. Owing to their capabilities of generating high-fidelity samples, DDPMs can be utilized for signal restoration tasks in recovering a clean signal from its degraded observation(s), by conditioning the model on the degraded signal. The degraded signals are themselves contaminated versions of the clean signals; due to this correlation, they may encompass certain useful information about the target clean data distribution. However, naively adopting the standard Gaussian as the prior distribution in turn discards such information. In this paper, we propose to improve conditional DDPMs for signal restoration applications by leveraging a more informative prior that is jointly learned with the diffusion model. The proposed framework, called RestoreGrad, exploits the correlation between the degraded and clean signals to construct a better prior for restoration tasks. In contrast to existing DDPMs that just settle on using pre-defined or handcrafted priors, RestoreGrad learns the prior jointly with the diffusion model. To this end, we first derive a new objective function from a modified evidence lower bound (ELBO) of the data log-likelihood, to incorporate the prior learning process into conditional DDPMs. Then, we suggest a corresponding joint learning paradigm for optimizing the new ELBO. Notably, RestoreGrad requires minimum modifications to the diffusion model itself; thus, it can be flexibly implemented on top of various conditional DDPM-based signal restoration models. On speech and image restoration tasks, we show that RestoreGrad demonstrates faster convergence (5-10 times fewer training steps) to achieve on par or better perceptual quality of restored signals over existing DDPM baselines, along with improved robustness to using fewer sampling steps in inference time (2-2.5 times fewer steps), advocating the advantages of leveraging jointly learned prior for efficiency improvements in the diffusion process.

Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9026
Loading