Abstract: Diffusion models have garnered significant success in generative tasks, emerging as the predominant model in this domain. Despite their success, the substantial computational resources required for training diffusion models restrict their practical applications. In this paper, we resort to the optimal transport theory to accelerate the training of diffusion models, providing an in-depth analysis of the forward diffusion process. It shows that the upper bound on the Wasserstein distance of the distribution between any two timesteps in the diffusion process is an exponential decrease of the initial distance by a factor of times. This finding suggests that the state distribution of the diffusion model has a non-uniform rate of change at different points in time, thus highlighting the different importance of the diffusion timestep. To this end, we propose a novel non-uniform timestep sampling method based on the Bernoulli distribution, which favors more frequent sampling in significant timestep intervals. The key idea is to make the model focus on timesteps with larger differences, thus accelerating the training of the diffusion model. Experiments on benchmark datasets reveal that the proposed method significantly reduces the computational overhead while improving the quality of the generated images.
Primary Subject Area: [Generation] Generative Multimedia
Secondary Subject Area: [Generation] Multimedia Foundation Models
Relevance To Conference: Diffusion models are widely used in the field of multimedia generation. In our paper, an improved strategy for timestep sampling for diffusion model training is proposed. We analyze the diffusion model based on the optimal transport, and design a new timestep sampling method based on the conclusions of the analysis, which greatly improved the training speed and generation quality of the diffusion model. This helps to improve the training speed and generation quality of multimedia generation models.
Supplementary Material: zip
Submission Number: 892
Loading