CHOrD: Synthesizing Spatially Coherent, House-Scale, Organized, and Diverse 3D Indoor Scenes via Image-Based Layout Guidance
Keywords: Indoor Scene Synthesis, Digital Twins, Procedural Generation, Generative Models
TL;DR: A generative framework that employs an intermediate image-based layout representation to synthesize spatially coherent, house-scale 3D indoor scenes with state-of-the-art quality and diversity, along with a new high-quality dataset.
Abstract: We introduce CHOrD, a generative framework for synthesizing spatially coherent, house-scale, hierarchically organized, and diverse 3D indoor scenes. At the core of CHOrD is a two-stage generation paradigm: given a floor plan, CHOrD first synthesizes an intermediate, image-based 2D layout representation, which is subsequently transformed into a graph-based scene structure. In contrast to existing tabular-based or LLM-based generative models, the enhanced spatial capabilities of CHOrD substantially reduce long-standing artifacts frequently observed in prior work—such as physically implausible collisions, out-of-bound objects, inconsistent orientations, and incomplete layouts missing essential object placements. Furthermore, unlike existing methods, CHOrD can be conditioned on complex, irregular room shapes and is robust in synthesizing house-wide layouts that adhere to both geometric and semantic floor plan structures. We also introduce a novel layout dataset with expanded coverage of object categories and room configurations, as well as significantly improved data quality. CHOrD achieves state-of-the-art performance on both the 3D-FRONT dataset and our proposed dataset, excelling in spatial coherence, quality, and diversity, without relying on collision detection, iterative re-generation for self-correction, or predefined rules.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4402
Loading