Temporal Difference Learning for Diffusion Models

20 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Models, Temporal-difference learning, Generative models, Reinforcement learning
Abstract: Diffusion models are typically trained with reconstruction losses at single, isolated time steps, which does not enforce consistency between predictions along the denoising trajectory. This lack of cross-time consistency can degrade performance, especially for few-step samplers. We introduce a temporal difference (TD) objective that penalizes inconsistency of the model’s multi-step progress along the denoising path. By reformulating the diffusion process as a Markov reward process and casting the denoising task as a policy evaluation problem in reinforcement learning, we derive a unified TD approach that applies to both discrete- and continuous-time diffusion formulations. We further propose a principled sample-based reweighting method that stabilizes training. Empirically, we show that adding our TD objective can significantly improve sample efficiency and enhance generative model quality, as measured by FID. In particular, TD exhibits stronger advantages when the number of sampling steps is small, highlighting its practical utility under low-computation-budget scenarios. We provide extensive ablation studies to justify our design choices, including loss reweighting, regularization weight, and one-step distance. Overall, our TD approach can be a general drop-in that enforces cross-time consistency and improves fixed-NFE generation quality, with potential utility across a wide range of diffusion generative models.
Primary Area: generative models
Submission Number: 24583
Loading