TADA: Timestep-Aware Data Augmentation for Diffusion Models

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: data augmentation, generative models
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a timestep-aware data augmentation strategy that can enhance the performance of diffusion models in limited data settings.
Abstract: Data augmentation is a popular technique to improve the generalization performance of neural networks, particularly when dealing with limited data. However, simply applying augmentation techniques to generative models can lead to a distribution shift problem, producing unintended augmented-like output samples. While this issue has been actively studied in generative adversarial networks (GANs), little attention has been paid to diffusion models despite their widespread use. In this paper, we conduct the first comprehensive study of data augmentation for diffusion models, primarily investigating the relationship between distribution shifts and data augmentation. Our study reveals that distribution shifts in diffusion models originate exclusively from specific timestep intervals, rather than from the entire timesteps. Based on these findings, we introduce a simple yet effective data augmentation strategy that flexibly adjusts the augmentation strength depending on timesteps. Experiments on diverse diffusion model settings (e.g., noise schedule, model size, and sampling steps), datasets, and a training setup (e.g., training from scratch or transfer learning) show that our approach is applicable across different design choices, with minimal adjustments to the data processing pipeline. We expect that our data augmentation method can benefit various diffusion model designs and tasks across a wide scope of applications. We will make our code publicly available.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1139
Loading