FullDiffusion: Diffusion Models Without Time Truncation

Shohei Taniguchi; Masahiro Suzuki; Yusuke Iwasawa; Yutaka Matsuo

FullDiffusion: Diffusion Models Without Time Truncation

Shohei Taniguchi, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo

Published: 06 Mar 2025, Last Modified: 14 Mar 2025ICLR 2025 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 8 pages)

Keywords: diffusion models, time truncation

Abstract: Diffusion models are predominantly used for generative modeling, which synthesize samples by simulating the reverse process of a stochastic differential equation (SDE) that diffuses data into Gaussian noise. However, when simulating the reverse SDE, the SDE solver suffers from numerical instability near the time boundary; hence, in practice, the simulation is terminated before reaching the boundary point. This heuristic time truncation hinders the rigorous formulation of diffusion models, and requires additional costs of hyperparameter tuning. Moreover, such numerical instability often occurs even in training, especially when using a maximum likelihood loss. Therefore, the current diffusion model heavily relies on the time truncation technique in both training and inference. In this paper, we propose a method that completely eliminates the heuristic of time truncation. Our method eliminates numerical instability during maximum likelihood training by modifying the parameterization of the noise predictor and the noise schedule. We also propose a novel SDE solver that can simulate without time truncation by taking advantage of the semi-linear structure of the reverse SDE. These improvements enable stable training and sampling of diffusion models without relying on time truncation. In our experiments, we tested the effectiveness of our method on the CIFAR-10 and ImageNet-32 datasets by evaluating the test likelihood and the sample quality measured by the Fréchet inception distance (FID). We observe that our method consistently improve performance in both test likelihood and the FID compared to the baseline model of DDPM++.

Submission Number: 116

Loading