Keywords: DIffusion Models, Diverse Samples, Information Theory
Abstract: Diffusion models are renowned for their state-of-the-art performance in generating high-quality images. Identifying samples with new information beyond the training data is essential for data augmentation, especially for enhancing model performance in diverse and unforeseen real-world scenarios. However, the investigation of new information in the generated samples has not been well explored. Our investigation through the lens of information theory reveals that diffusion models do not produce new information beyond what exists in the training data. Next, we introduce the concept of diverse samples (DS) to prove that generated images could contain information not present in the training data for diffusion models. Furthermore, we propose a method for identifying diverse samples among generated images by extracting deep features and detecting images that fall outside the boundary of real images. We demonstrate that diverse samples exist in the generated data of diffusion models, attributed to the estimation of forward and backward processes, but it can only produce a limited number of diverse samples, underscoring a notable gap in their capabilities in generating diverse samples. In addition, our experiment on the Chest X-ray dataset demonstrates that the diverse samples are more useful in improving classification accuracy than vanilla-generated samples. The source code is available at \url{https://github.com/lypz12024/diffusion-diverse-samples}.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7170
Loading