Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching

TMLR Paper5803 Authors

03 Sept 2025 (modified: 06 Nov 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Diffusion models are commonly interpreted as learning the score function, i.e., the gradient of the log-density of noisy data. However, this assumption implies that the target of learning is a conservative vector field, which is not enforced by the neural network architectures used in practice. We present numerical evidence that trained diffusion networks violate both integral and differential constraints required of true score functions, demonstrating that the learned vector fields are not conservative. Despite this, the models perform remarkably well as generative mechanisms. To explain this apparent paradox, we advocate a new theoretical perspective: diffusion training is better understood as flow matching to the velocity field of a Wasserstein Gradient Flow (WGF), rather than as score learning for a reverse-time stochastic differential equation. Under this view, the ``probability flow'' arises naturally from the WGF framework, eliminating the need to invoke reverse-time SDE theory and clarifying why generative sampling remains successful even when the neural vector field is not a true score. We further show that non-conservative errors from neural approximation do not necessarily harm density transport. Our results advocate for adopting the WGF perspective as a principled, elegant, and theoretically grounded framework for understanding diffusion generative models.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=aZp4HQdhYg
Changes Since Last Submission: Revised the submission based on input from reviewers. - Revised the overall organization of the papers: - Added a proper introduction. - Added some suggested citations in Literature review. - The main idea is now separated into its own Section 4. - Added more experiment results on CelebA-HQ-256 and Neal's funnel distribution. - Revised the general tone and wording to enhance the clarity of discussion. - Fixed typos and formatting errors as noted by the reviewers.
Assigned Action Editor: ~Jes_Frellsen1
Submission Number: 5803
Loading