Self-Supervised Diffusion Model Sampling With Reinforcement Learning

Daniel Bairamian; Paul Barde; Mattie Tesfaldet; Antonios Valkanas; Derek Nowrouzezahrai

Self-Supervised Diffusion Model Sampling With Reinforcement Learning

Daniel Bairamian, Paul Barde, Mattie Tesfaldet, Antonios Valkanas, Derek Nowrouzezahrai

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Diffusion Models, Generative Models, Self-Supervised

TL;DR: Using Self-Supervised Reinforcement Learning to make diffusion model denoise faster while maximizing quality

Abstract: Diffusion models have established themselves as the state-of-the-art for generative modeling, dethroning Generative Adversarial Networks (GANs) by generating higher-quality samples while remaining more stable throughout training. However, diffusion models generate samples iteratively and remain slow at inference time. Our work proposes to leverage reinforcement learning (RL) to accelerate inference by building on the recent framing of diffusion's iterative denoising process as a sequential decision-making problem. Specifically, our approach learns a scheduler policy that maximizes sample quality while remaining within a fixed budget of denoising steps. Importantly, our method is agnostic to the underlying diffusion model and does not re-train it. Finally, unlike previous RL approaches that rely on supervised pairs of noise and corresponding denoised images, our method is self-supervised and directly maximizes similarity in dataset feature space. Overall, our approach offers a more flexible and efficient framework for improving diffusion model's inference in terms of speed and quality.

Primary Area: reinforcement learning

Submission Number: 14187

Loading