Learning few-step posterior samplers by unfolding and distillation of diffusion models

Learning few-step posterior samplers by unfolding and distillation of diffusion models

TMLR Paper5283 Authors

03 Jul 2025 (modified: 18 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Diffusion models (DMs) have emerged as powerful image priors in Bayesian computational imaging. Two primary strategies have been proposed for leveraging DMs in this context: Plug-and-Play methods, which are zero-shot and highly flexible but rely on approximations; and specialized conditional DMs, which achieve higher accuracy and faster inference for specific tasks through supervised training. In this work, we introduce a novel framework that integrates deep unfolding and model distillation to transform a DM image prior into a few-step conditional model for posterior sampling. A central innovation of our approach is the unfolding of a Markov chain Monte Carlo (MCMC) algorithm—specifically, the recently proposed LATINO Langevin sampler (Spangnoletti et al., 2025)—representing the first known instance of deep unfolding applied to a Monte Carlo sampling scheme. We demonstrate our proposed unfolded and distilled samplers through extensive experiments and comparisons with the state of the art, where they achieve excellent accuracy and computational efficiency, while retaining the flexibility to adapt to variations in the forward model at inference time.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: We are thankful to each of the referees for their careful reading and informative feedback on our manuscript. Modifications to the structure of the document aim to clarify the setup, notation and contributions of this work. Additionally we include two new appendices summarised below: Appendix B formalises the link between our UD2M model, the LATINO iterative MCMC kernel and Langevin Dynamics. In particular, we formulate LATINO as a plug-and-play splitting scheme for approximating the Langevin diffusion with a score-based denoiser from a diffusion (or consistency) model. We compare in more detail the unfolded UD2M approach and the use of the LATINO transition kernel in Spagnoletti et. al. 2025. Appendix D.1 includes an additional ablation study for a new experiment concerning 4 times super-resolution on MNIST digits. The purpose of this experiment is twofold: to efficiently train several instances of our unfolded model with varying number of unfolded timesteps $K$; and to conduct robust comparison between the learned posterior measures and the empirical joint distribution available from training data. In particular, we compute a variation of the Frechet Inception Distance using a VAE embedding network which is pretrained to accurately encode MNIST digits into a Gaussian distribution. The resulting FID score is therefore an accurate representation of the distance between measures on MNIST-type data. To compare posterior measures, we additionally compute a conditional variant of this modified FID as in Soloveitchik et. al. 2021 and empirical coverage probabilities.

Assigned Action Editor: ~Trevor_Campbell1

Submission Number: 5283

Loading