Keywords: resource efficient training, deep learning, generative models, diffusion models
Abstract: Diffusion models have emerged as a new standard technique in generative AI due to their huge success in various applications. However, their training can be prohibitively resource- and time-consuming, resulting in high-carbon footprint. To address this issue, we propose a novel and practical training strategy that significantly reduces the training time, even enhancing generation quality. We observe that diffusion models exhibit different convergence rates and training patterns at different time steps, inspiring our MDM (Multi-expert Diffusion Model). Each expert specializes in a group of time steps with similar training patterns. We can exploit the variations in iteration required for convergence among different local experts to reduce total training time significantly. Our method improves the training efficiency of the diffusion model by (1) reducing the total GPU hours and (2) enabling parallel training of experts without overhead to further reduce the wall-clock time. When applied to three baseline models, our MDM accelerates training x 2.7 - 4.7 faster than the corresponding baselines while reducing computational resources by 24 - 53%. Furthermore, our method improves FID by 7.7% on average, including all datasets and models.
Submission Number: 2
Loading