S-INF: Towards Realistic Indoor Scene Synthesis via Scene Implicit Neural Field

Zixi Liang, Guowei Xu, Haifeng Wu, Ye Huang, Wen Li, Lixin Duan

Published: 09 Dec 2024, Last Modified: 05 Mar 2025AAAI 2025EveryoneCC BY 4.0

Abstract: Learning-based methods have become increasingly popular in 3D indoor scene synthesis (ISS), showing superior performance over traditional optimization-based approaches. These learning-based methods typically model distributions on simple yet explicit scene representations using generative models. However, due to the oversimplified explicit representations that overlook detailed information and the lack of guidance from multimodal relationships within the scene, most learning-based methods have struggled with generating realistic and diverse indoor scenes. In this paper, we introduce a new method, Scene Implicit Neural Field (S-INF), for indoor scene synthesis, aiming to learn meaningful representations of multimodal relationships, in order to enhance the diversity and realism of indoor scenes. S-INF directly extracts more latent advantageous features from the entire scene in a multi-scale manner, effectively capturing multimodal relationships. Furthermore, by learning specialized scene layout relationships and projecting them into S-INF, we achieve realistic generation of scene layout. Additionally, S-INF captures dense and detailed object relationships through differentiable rendering, ensuring stylistic consistency across objects. Through extensive experiments on the benchmark 3D-FRONT dataset, we demonstrate that our method consistently achieves state-of-the-art performance under different settings for the indoor scene synthesis task.