Relation-Augmented Diffusion for Layout-to-Image Generation

Xueqi Bao; Ke Li; Xiaohu Wu; Ping Ma; Qicheng Lao

Relation-Augmented Diffusion for Layout-to-Image Generation

Xueqi Bao, Ke Li, Xiaohu Wu, Ping Ma, Qicheng Lao

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Layout-to-image, Image Generation, Relation-Augmented, Diffusion

Abstract: Existing layout-to-image generation methods often struggle in complex scenes with multiple objects, frequently exhibiting issues such as missing objects, positional errors, and semantic inconsistencies. These shortcomings largely stem from a fundamental inability to model inter-object relationships, which limits their capacity to capture spatial and relational cues effectively. To address these challenges, we propose \textit{Relation-Augmented Diffusion}, a novel framework for layout-to-image generation that explicitly models inter-object relations and implicitly coordinates background-object interactions. We introduce a relation bounding box computation module to spatially encode object interactions, transforming abstract relations into concrete visual representations. These are further embedded into a topological scene graph via a graph convolutional network, enabling bidirectional reasoning between objects and their relations. Additionally, we employ a layout fusion module to harmonize implicit background-object spatial dependencies, which integrates global layout structures with background features to enhance overall scene coherence. Extensive experiments on HICO-DET, COCO-Position, and T2I-CompBench demonstrate that our framework significantly outperforms state-of-the-art methods in generating spatially and semantically consistent images.

Primary Area: generative models

Submission Number: 7548

Loading