everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Denoising diffusion probabilistic models (DDPMs) estimate the data distribution by sequentially denoising samples drawn from a prior distribution, which is typically assumed to be the standard Gaussian for simplicity. Owing to their capabilities of generating high-fidelity samples, DDPMs can be utilized for signal restoration tasks in recovering a clean signal from its degraded observation(s), by conditioning the model on the degraded signal. The degraded signals are themselves contaminated versions of the clean signals; due to this correlation, they may encompass certain useful information about the target clean data distribution. However, naively adopting the standard Gaussian as the prior distribution in turn discards such information. In this paper, we propose to improve conditional DDPMs for signal restoration applications by leveraging a more informative prior that is jointly learned with the diffusion model. The proposed framework, called RestoreGrad, exploits the correlation between the degraded and clean signals to construct a better prior for restoration tasks. In contrast to existing DDPMs that just settle on using pre-defined or handcrafted priors, RestoreGrad learns the prior jointly with the diffusion model. To this end, we first derive a new objective function from a modified evidence lower bound (ELBO) of the data log-likelihood, to incorporate the prior learning process into conditional DDPMs. Then, we suggest a corresponding joint learning paradigm for optimizing the new ELBO. Notably, RestoreGrad requires minimum modifications to the diffusion model itself; thus, it can be flexibly implemented on top of various conditional DDPM-based signal restoration models. On speech and image restoration tasks, we show that RestoreGrad demonstrates faster convergence (5-10 times fewer training steps) to achieve on par or better perceptual quality of restored signals over existing DDPM baselines, along with improved robustness to using fewer sampling steps in inference time (2-2.5 times fewer steps), advocating the advantages of leveraging jointly learned prior for efficiency improvements in the diffusion process.