SRA-SD: A Lightweight Framework for Structure-Guided Compositional Image Synthesis

Yajing Xu; Jiaoyan Chen; Zhiqiang Liu; Yichi Zhang; Yarong Lan; Jeff Z. Pan; Lei Liang; Wen Zhang; Huajun Chen

SRA-SD: A Lightweight Framework for Structure-Guided Compositional Image Synthesis

Yajing Xu, Jiaoyan Chen, Zhiqiang Liu, Yichi Zhang, Yarong Lan, Jeff Z. Pan, Lei Liang, Wen Zhang, Huajun Chen

16 Sept 2025 (modified: 30 Sept 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: structural knowledge augmented; text-to-image generation

Abstract: Diffusion models have demonstrated remarkable capabilities in text-to-image generation. However, they often fail to faithfully reflect the details specified in the text, missing objects or exhibiting objects with unmatched attributes and wrong spatial locations. To address this problem, we propose SRA-SD, a lightweight structure-aware framework that enhances generation fidelity by explicitly model- ing both spatial relations and attribute bindings. Our method introduces two complementary modules: (1) a spatial relation enhancement module that extracts relational triples via a large language model and encodes them into heterogeneous semantic graphs, enriching the text representation with structural layout knowledge through graph neural networks; and (2) an attribute enhancement module that en- forces fine-grained object-attribute alignment via contrastive cross-attention learning, using syntactically derived positive pairs and semantically plausible negative samples. To better evaluate both capabilities, we introduce SRA-Bench, a new benchmark that jointly assesses spatial reasoning and attribute binding. Experiments on three datasets show that SRA-SD significantly improves generation accuracy with minimal parameter overhead, outperforming existing methods in complex, compositional scenarios.

Primary Area: generative models

Submission Number: 7144

Loading