StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Keywords: diffusion, interpretation, attention, attribution, semantics, synthetic data, dataset
TL;DR: We introduce a synthetic dataset with diffusion images and semantic attributions
Abstract: Understanding dense visual semantics remains a fundamental challenge in computer vision, as semantically similar objects can exhibit drastically different visual appearances. Recent advancements in generative text-to-image frameworks have led to models that implicitly capture natural scene statistics. These models learn to model complex relationships between objects, lighting, and other visual factors, enabling the generation of detailed and contextually rich images from text captions. To advance visual semantic understanding and develop more robust and interpretable vision models, we present StableSemantics, a large-scale dataset composed of 224 thousand human-curated prompts, processed natural language captions, over 2 million synthetic images, and 10 million attention maps. The dataset provides fine-grained semantic attributions at the noun-chunk level, leverages human-generated prompts that correspond to visually interesting stable diffusion generations, and provides 10 generations per phrase, with cross-attention maps corresponding to noun chunks for each image. We explore the semantic distribution of generated images, examine the distribution of objects within images, and benchmark captioning and open vocabulary segmentation methods on our data. As the first diffusion dataset to include dense attention attributions, we expect StableSemantics to catalyze advances in visual semantic understanding and provide a foundation for developing more sophisticated and effective visual models.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7262
Loading