Diverse Diffusion: Enhancing Image Diversity in Text-to- Image Generation

Diverse Diffusion: Enhancing Image Diversity in Text-to- Image Generation

TMLR Paper2942 Authors

30 Jun 2024 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity beyond gender and ethnicity, spanning into richer realms. Diverse Diffusion is a general unsupervised technique that can be applied to existing text-to-image models. Our approach focuses on finding vectors in the Stable Diffusion latent space that are distant from each other. We generate multiple vectors in the latent space until we find a set of vectors that meets the desired distance requirements and the required batch size. To evaluate the effectiveness of our diversity methods, we conduct experiments examining various characteristics, including color diversity, LPIPS metric, and ethnicity/gender representation in images featuring humans. We also provide image quality assessment by human raters. The results of our experiments emphasize the significance of diversity in generating realistic and varied images, offering valuable insights for improving text-to-image models. Through the enhancement of image diversity without decrease in quality, our approach contributes to the creation of more inclusive and representative AI-generated art.

Submission Length: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=F0J1N6M5N9&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)

Changes Since Last Submission: We thank the reviewer for their help; we have revised the paper according to their comments. Trade Off quality/diversity: We add a study with human ratings (section 5.4), so that we can observe if the diversity improvement implies a decreased quality. We observe a slight (not statistically significant) quality improvement. We conclude that we improve the diversity without decreasing the quality and without significant computational overhead. Risk of prompting method: this risk has been vastly emphasized by many articles such as https://www.aljazeera.com/news/2024/3/9/why-google-gemini-wont-show-you-white-people We modify the paper for making this point more visible. Comparison with CFG scale: we add a paragraph explaining that the CFG scale is typically a tradeoff between quality and diversity, whereas our method does not penalize quality (see the new human rating study, section 5.4) Comparison with baselines such as vanilla SD or ENTIGEN (i.e. prompt manipulation method, a.k.a meta-prompt): We show, in section 5.3, that we provide superior diversity results (measured with LPIPS) than the ones achieved with prompt manipulation for enhanced diversity. As in previous versions of the paper, we show (sections 5.1 and 5.2) better diversity for our modified SD than for the vanilla SD. We add a “latexdiff” of all the modifications.

Assigned Action Editor: ~Colin_Raffel1

Submission Number: 2942

Loading