Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching

Published: 20 Dec 2025, Last Modified: 20 Dec 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Diffusion models are commonly interpreted as learning the score function, i.e., the gradient of the log-density of noisy data. However, this learning target is a conservative vector field (i.e., a vector field that is the gradient of some function), a property not enforced by neural network architectures used in practice. We show numerically that trained diffusion networks violate both the integral and differential constraints that conservative vector fields must satisfy, indicating that the learned vector fields are not score functions of any density. Despite this, the models perform remarkably well as generative mechanisms. To explain this paradox, we propose a new theoretical perspective: diffusion training is better understood as \emph{flow matching} to the velocity field of a Wasserstein Gradient Flow (WGF), rather than as score learning for a reverse-time stochastic differential equation. Under this view, the "probability flow" arises naturally from the WGF framework, eliminating the need to invoke reverse-time SDE theory and clarifying why generative sampling remains successful, even when the neural vector field is not a true score. We further show that non-conservative errors from neural approximation do not necessarily harm density transport. Our results advocate adopting the WGF perspective as a principled, elegant, and theoretically grounded framework for understanding diffusion generative models.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=aZp4HQdhYg
Changes Since Last Submission: This revision modifies to authors style to match the requirements.
Assigned Action Editor: ~Jes_Frellsen1
Submission Number: 5803
Loading