Open-Scene Understanding-oriented 3D Scene Graph Generation

Yuansu Hao, Fei Yu, Yanhao Wang, Yuehua Li, Quan Deng, Yuan Yu, Chen Huang, Nan Che

Published: 2025, Last Modified: 03 Mar 2026ICME 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Understanding complex 3D environments is essential for many computer vision and robotic applications, especially in highly dynamic open-scene scenarios. The 3D scene graph plays an important role in the comprehension of 3D environments. However, most existing methods for 3D scene graph generation depend on pre-specified object and relationship classes (i.e., closed vocabulary) and labeled data for training, which restricts their effectiveness in the open-scene setting. To address this issue, we propose a novel Open-Scene Understanding-oriented 3D Scene Graph (OSU-3DSG) framework that can operate without labeled training data. The OSU-3DSG framework effectively extracts visual features from RGB-D image sequences and fuses them with camera pose estimates to create accurate 3D object maps. Then, by leveraging a pre-trained Vision Language Model (VLM), it generates relational triplets and constructs 3D scene graphs in a zero-shot manner. In particular, it excels at adaptively recognizing and interpreting object relationships, making it suitable for open-world applications. Finally, we perform extensive experiments on two open-world 3D datasets, namely 3DSSG and Replica, to evaluate the effectiveness and adaptability of the OSU-3DSG framework, demonstrating its potential to pave the way for the advancement of open-scene understanding. Our code and data are published at https://github.com/YuansuHao/OSU-3DSG.
Loading