EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

ACL ARR 2024 June Submission3932 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Language-guided 3D scene editing has emerged as a pivotal technology in fields such as virtual reality, augmented reality, gaming, architecture, and film production. Traditional methods of 3D scene editing require extensive expertise and time due to the complexity of 3D environments. Recent advancements in language-guided 3D scene editing offer promising solutions, but existing approaches either limit editing to generated scenes or focus on appearance modifications without supporting comprehensive scene layout changes. In this work, we propose EditRoom, a novel framework for language-guided 3D room layout editing that addresses these limitations. EditRoom leverages Large Language Models (LLMs) for command planning and a graph diffusion-based method for executing six editing types: rotate, translate, scale, replace, add, and remove. In addition, we introduce EditRoom-DB, a large-scale dataset with 83k editing pairs, for training and evaluation purposes. Our approach significantly improves the accuracy and coherence of scene editing, effectively handling complex commands with multiple operations. Experimental results demonstrate EditRoom's superior performance in both single and complex editing scenarios, highlighting its potential for practical applications.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: 3D Scene Editing, Large Language Model, Diffusion-based Models
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: english
Submission Number: 3932
Loading