CHOrD: Synthesizing Spatially Coherent, House-Scale, Organized, and Diverse 3D Indoor Scenes via Image-Based Layout Guidance

Chong Su; Yingbin Fu; Zheyuan Hu; Jing Yang; Param Hanji; wangshaojun; zhaoxuan; Alejandro Sztrajman; Cengiz Oztireli; Fangcheng Zhong

CHOrD: Synthesizing Spatially Coherent, House-Scale, Organized, and Diverse 3D Indoor Scenes via Image-Based Layout Guidance

Chong Su, Yingbin Fu, Zheyuan Hu, Jing Yang, Param Hanji, wangshaojun, zhaoxuan, Alejandro Sztrajman, Cengiz Oztireli, Fangcheng Zhong

12 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Indoor Scene Synthesis, Digital Twins, Procedural Generation, Generative Models

TL;DR: A generative framework that employs an intermediate image-based layout representation to synthesize spatially coherent, house-scale 3D indoor scenes with state-of-the-art quality and diversity, along with a new high-quality dataset.

Abstract: We introduce CHOrD, a generative framework for synthesizing spatially coherent, house-scale, hierarchically organized, and diverse 3D indoor scenes. At the core of CHOrD is a two-stage generation paradigm: given a floor plan, CHOrD first synthesizes an intermediate, image-based 2D layout representation, which is subsequently transformed into a graph-based scene structure. In contrast to existing tabular-based or LLM-based generative models, the enhanced spatial capabilities of CHOrD substantially reduce long-standing artifacts frequently observed in prior work—such as physically implausible collisions, out-of-bound objects, inconsistent orientations, and incomplete layouts missing essential object placements. Furthermore, unlike existing methods, CHOrD can be conditioned on complex, irregular room shapes and is robust in synthesizing house-wide layouts that adhere to both geometric and semantic floor plan structures. We also introduce a novel layout dataset with expanded coverage of object categories and room configurations, as well as significantly improved data quality. CHOrD achieves state-of-the-art performance on both the 3D-FRONT dataset and our proposed dataset, excelling in spatial coherence, quality, and diversity, without relying on collision detection, iterative re-generation for self-correction, or predefined rules.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 4402

Loading