A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions

Nishant Jain; Tong Zhang

A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions

Nishant Jain, Tong Zhang

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion models, probability flow ODEs, score based generative models, convergence analysis

Abstract: Diffusion-based generative models have emerged as highly effective methods for synthesizing high-quality samples. Recent works have focused on analyzing the convergence of their generation process with minimal assumptions, either through reverse SDEs or probability flow ODEs. The best known guarantees, without any smoothness assumptions, for the KL divergence so far achieve a linear dependence on the data dimension $d$ and an inverse quadratic dependence on accuracy level $\varepsilon$. In this work, we present a refined analysis for the standard Exponential Integrator discretization that improves the dependence on $\varepsilon$, at the same time maintaining the linear dependence on $d$. Following recent works on higher order/randomized midpoint discretizations, we model the generation process as a composition of two steps: a reverse ODE step followed by a smaller noising step, which leads to better dependence on step size. We then provide a novel analysis which achieves linear dependence on $d$ for the ODE discretization error without any smoothness assumptions. Specifically, we introduce a general ODE-based counterpart of the stochastic localization argument from Benton et al and develop new proof techniques to bound second-order spatial derivatives of the score function -- terms that do not arise in previous diffusion analyses and cannot be handled by existing techniques. Leveraging this framework, we prove that $\tilde{O}\left(\tfrac{d \log^{3/2}(1/\delta)}{\varepsilon}\right)$ steps suffice to approximate the target distribution—corrupted by Gaussian noise of variance $\delta$—to within $O(\varepsilon^2)$ in KL divergence, improving upon the previous best result requiring $\tilde{O}\left(\tfrac{d \log^2(1/\delta)}{\varepsilon^2}\right)$ steps.

Primary Area: learning theory

Submission Number: 8099

Loading