RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper proposes an integration of conditional denoising diffusion probabilistic models (DDPMs) into the variational autoencoder (VAE) framework to jointly learn a more informative diffusion prior in signal restoration applications.
Abstract: Denoising diffusion probabilistic models (DDPMs) can be utilized to recover a clean signal from its degraded observation(s) by conditioning the model on the degraded signal. The degraded signals are themselves contaminated versions of the clean signals; due to this correlation, they may encompass certain useful information about the target clean data distribution. However, existing adoption of the standard Gaussian as the prior distribution in turn discards such information when shaping the prior, resulting in sub-optimal performance. In this paper, we propose to improve conditional DDPMs for signal restoration by leveraging a more informative prior that is jointly learned with the diffusion model. The proposed framework, called RestoreGrad, seamlessly integrates DDPMs into the variational autoencoder (VAE) framework, taking advantage of the correlation between the degraded and clean signals to encode a better diffusion prior. On speech and image restoration tasks, we show that RestoreGrad demonstrates faster convergence (5-10 times fewer training steps) to achieve better quality of restored signals over existing DDPM baselines and improved robustness to using fewer sampling steps in inference time (2-2.5 times fewer), advocating the advantages of leveraging jointly learned prior for efficiency improvements in the diffusion process.
Lay Summary: Restoring clean signals—like clear speech or undistorted images—from their contaminated or degraded versions is a long-standing challenge in machine learning and signal processing. While diffusion-based generative models have shown strong potential for this task, they often rely on oversimplified assumptions about the data distribution (such as assuming the prior noise follows a standard Gaussian distribution), which hinders training efficiency and limits restoration quality. Our paper presents RestoreGrad, a novel framework that enhances diffusion-based signal restoration by learning a more informative representation of the latent noise—known as the "prior"—in tandem with the diffusion model. Unlike previous methods that use fixed or manually designed priors, RestoreGrad automatically learns this prior through a pair of encoder networks, effectively combining the generative strength of diffusion models with the modeling efficiency of variational autoencoders. RestoreGrad speeds up training by up to 10× and requires fewer inference steps, making it more efficient and practical. It improves restoration quality for both speech and images, generalizes well to unseen data, and introduces only minor computational overhead. This makes RestoreGrad a practical and scalable solution for real-world audio and visual enhancement—and potentially other signal restoration problems.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Denoising diffusion probabilistic model, prior distribution, posterior, speech enhancement, image restoration
Submission Number: 13761
Loading