SR$^2$: BOOSTING 3D LARGE LANGUAGE MODEL WITH SPATIAL RELATION REASONING

26 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Large Language Model, Spatial Relation Reasoning, 3D Segmentation
Abstract: Recent research in point cloud perception has achieved considerable progress in enhancing scene understanding by means of vision-language alignment through large language models (LLMs). However, existing methods may still encounter challenges in handling complex instructions that require accurate spatial reasoning, even if the 3D point cloud data has provided detailed spatial cues such as size, position, and orientation for identifying the targets. To tackle this issue, this study introduces a new 3D multi-modal LLM framework, Spatial Relation Reasoning (SR$^2$). This framework is designed to strengthen relational reasoning capabilities in 3D environments. SR$^2$ mimics human reasoning behavior by first broadly identifying all relevant elements and then carefully examining them to determine the target. In addition, as current datasets may not comprehensively evaluate the complex spatial reasoning capabilities of various models, we propose a new benchmark named 3D ReasonSeg that consists of 25,000 and 4,152 high-quality samples for training and evaluation respectively. Both quantitative and qualitative experiments demonstrate that SR$^2$ and 3D ReasonSeg effectively endow 3D point cloud perception with stronger spatial reasoning capabilities, and we hope that the proposed SR$^2$ and 3D ReasonSeg can serve as a new baseline and benchmark for future work. The code and model will be made publicly available.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6126
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview