Reasoning3D - Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models

Published: 05 Mar 2025, Last Modified: 18 Mar 2025Reasoning and Planning for LLMs @ ICLR2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reasoning Segmentation, 3D Segmentation, 3D Model Parsing, 3D Part Understanding, Large Language Model, Large Vision-Language Model, Computer-Human Interaction
Abstract: In this paper, we introduce a new task: Zero-Shot 3D Reasoning Segmentation, a new paradigm in 3D segmentation that goes beyond traditional category-specific methods. We propose a baseline method, Reasoning3D, that leverages pre-trained 2D segmentation networks powered by Large Language Models (LLMs) to interpret user queries and segment 3D meshes with contextual awareness. This approach enables fine-grained part segmentation and generates natural language explanations without requiring extensive 3D datasets. Experiments demonstrate that Reasoning3D can effectively localize and highlight parts of 3D objects. Our training-free method allows rapid deployment and serves as a universal baseline for future research in various fields such as robotics, object manipulation, autonomous driving, AR/VR, and medical applications. The code and the user interface have been released publicly.
Submission Number: 1
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview