Relation-Augmented Diffusion for Layout-to-Image Generation

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Layout-to-image, Image Generation, Relation-Augmented, Diffusion
Abstract: Existing layout-to-image generation methods often struggle in complex scenes with multiple objects, frequently exhibiting issues such as missing objects, positional errors, and semantic inconsistencies. These shortcomings largely stem from a fundamental inability to model inter-object relationships, which limits their capacity to capture spatial and relational cues effectively. To address these challenges, we propose \textit{Relation-Augmented Diffusion}, a novel framework for layout-to-image generation that explicitly models inter-object relations and implicitly coordinates background-object interactions. We introduce a relation bounding box computation module to spatially encode object interactions, transforming abstract relations into concrete visual representations. These are further embedded into a topological scene graph via a graph convolutional network, enabling bidirectional reasoning between objects and their relations. Additionally, we employ a layout fusion module to harmonize implicit background-object spatial dependencies, which integrates global layout structures with background features to enhance overall scene coherence. Extensive experiments on HICO-DET, COCO-Position, and T2I-CompBench demonstrate that our framework significantly outperforms state-of-the-art methods in generating spatially and semantically consistent images.
Primary Area: generative models
Submission Number: 7548
Loading