Object-Relational Graph Framework for Zero-Shot 3D Scene Segmentation

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: learning on graphs and other geometries & topologies
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Deep learning, Point cloud analysis, 3D scene segmentation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: When it comes to understanding 3D scenes, the capability to perform zero-shot comprehension on real-world objects is essential, as unseen objects, absent from the training data, frequently appear in natural scenes. While prior works have proposed zero-shot scene segmentation approaches by aligning 3D point features with other data modalities, they often rely on paired image sets for 3D data. More importantly, they treat scene objects independently, thereby neglecting the rich relational information inherent in scenes. This relational information between objects plays a pivotal role in identifying unseen objects within a scene, transforming the zero-shot scene understanding problem into a question as 'which object is likely to be adjacent to object A and on top of object B?'. Toward this, we introduce a novel open-vocabulary 3D scene segmentation strategy, ORG, which embeds 3D scenes into a knowledge graph framework. Our framework constructs entity graphs among scene objects using a 2D segmentation foundation model and learns relational knowledge within this graph structure. By semantically aligning node embeddings with text embedding space, ORG performs zero-shot inference effectively while leveraging prior relational knowledge specific to a given scene. Our method consistently outperforms existing zero-shot scene segmentation approaches on three 3D scene understanding datasets: S3DIS, ScanNetV2, and 3DSSG.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9467
Loading