A Diffusion Model Induced by MSE Training

A Diffusion Model Induced by MSE Training

ICLR 2026 Conference Submission20837 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion models, score-based generative modeling, generative models, deep learning

TL;DR: MSE-Induced Diffusion

Abstract: A diffusion model for image generation transforms noise into an image via a neural denoiser. The denoiser is trained with a time-integrated, weighted mean-squared error (MSE) between the noised image and the network’s prediction. The weighting is often absorbed into the noised image, yielding different parameterizations of the prediction (e.g., noise-, data-, or velocity parameterization). Thus, the denoiser is determined by the noise schedule and the chosen parameterization, whereas the generative diffusion process is specified by its noise and diffusion schedules (i.e., by both the scale and the variance-rate coefficients). In practice, the generator typically inherits only the noise schedule from the trained denoiser. In this work, guided by a principle of coherence between the MSE training objective and maximum-likelihood (ML) proximity of the induced processes, we derive a closed-form expression for the diffusion schedule given a noise schedule and a network parameterization. Widely used methods train on one (implicit) process but generate with another—often one with an optimal diffusion schedule in the ML sense, or even with zero diffusion, that is a deterministic flow. Recent empirical approaches yield diffusion schedules closer to our formula, which supports the coherence principle and suggests that it is beneficial to generate samples using the very process that is actually learned. We analyze both discrete-time and continuous-time models using elementary autoregressive arguments, yielding formulas that are simpler than those used so far. In particular, we provide a representation of the diffusion state as the sum of an explicit linear component, an unweighted pathwise integral of the denoiser, and a noise term. This representation makes it straightforward to apply classical numerical integration methods and clarifies the relation to the DPM-solver family.

Primary Area: generative models

Submission Number: 20837

Loading