Keywords: sample diversity, text-to-image, diffusion, evaluation
TL;DR: We propose a method to evaluate sample diversity in T2I models
Abstract: Text-to-image (T2I) models are remarkable at generating realistic images based on textual descriptions. However, textual prompts are inherently *underspecified*: they do not specify all possible attributes of the required image. This raises key questions: do T2I models generate diverse outputs on typical underspecified prompts? How can we automatically measure diversity? We propose **GRADE**: **Gr**anular **A**ttribute **D**iversity **E**valuation, an automatic method for quantifying sample diversity. GRADE leverages the world knowledge embedded in large language models and visual question-answering systems to identify relevant concept-specific axes of diversity (e.g., ''shape'' and ''color'' for the concept ''cookie''). It then estimates attribute distributions and quantifies diversity using (normalized) entropy. GRADE achieves over 90\% human agreement while exhibiting weak correlation to commonly used diversity metrics. We use GRADE to measure the overall diversity of 12 T2I models using 400 concept-attribute pairs, revealing that even the most diverse models display limited variation. Further, we find these models often exhibit *default behaviors*, a situation where the model consistently generates concepts with the same attributes (e.g., 98\% of the cookies are round). Finally, we demonstrate that a key reason for low diversity is due to underspecified captions in training data.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10539
Loading