LAION-Comp: Unlocking Controllable and Compositional Generation with Structural Annotations

Zejian Li; Chenye Meng; Yize Li; Ling Yang; Zhongni Liu; Changle Xie; Jiarui Ma; Shengyuan Zhang; Jiayi Li; Guang Yang; Changyuan Yang; Zhiyuan Yang; Jinxiong Chang; Lingyun Sun

LAION-Comp: Unlocking Controllable and Compositional Generation with Structural Annotations

Zejian Li, Chenye Meng, Yize Li, Ling Yang, Zhongni Liu, Changle Xie, Jiarui Ma, Shengyuan Zhang, Jiayi Li, Guang Yang, Changyuan Yang, Zhiyuan Yang, Jinxiong Chang, Lingyun Sun

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: dataset, compositional image generation, diffusion model

Abstract: Despite their success in generating high-quality images, text-to-image (T2I) models struggle to generate compositional scenes with multiple objects and their intricate relationships. We attribute this issue to limitations in existing datasets of image-text pairs, which lack precise inter-object relationship annotations with prompts only. To resolve this, we construct LAION-Comp, a large-scale dataset of 540K+ aesthetic images structurally annotated with detailed scene graphs explicitly encoding multiple objects, corresponding attributes, and intricate relations. The annotation pipeline employs a large vision-language model followed by partial human verification. Using LAION-Comp, we train 4 baseline models on diffusion and flow matching backbones augmented with a designed scene graph encoder. For proper evaluation, we introduce CompSGen Bench, a benchmark with 20,838 testing samples designed to systematically evaluate complex compositions. Experiments show that the 4 models trained on LAION-Comp outperform their original prompt-only counterparts and advanced scene-graph-based methods on both our new and existing compositional benchmarks. Furthermore, the learned structural conditioning naturally enables fine-grained, object-level image editing, demonstrating its potential as an effective editing interface. Our work validates the advantages of explicit structural annotation and contributes the community with a foundational resource to advance controllable and compositional image synthesis.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 5794

Loading