Reasoning3D - Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models

Tianrun Chen; Chunan Yu; Jing Li; Jianqi Zhang; Lanyun Zhu; Deyi Ji; Yong Zhang; Ying Zang; Lingyun Sun; Zejian Li

Reasoning3D - Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models

Tianrun Chen, Chunan Yu, Jing Li, Jianqi Zhang, Lanyun Zhu, Deyi Ji, Yong Zhang, Ying Zang, Lingyun Sun, Zejian Li

Published: 05 Mar 2025, Last Modified: 18 Mar 2025Reasoning and Planning for LLMs @ ICLR2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reasoning Segmentation, 3D Segmentation, 3D Model Parsing, 3D Part Understanding, Large Language Model, Large Vision-Language Model, Computer-Human Interaction

Abstract: In this paper, we introduce a new task: Zero-Shot 3D Reasoning Segmentation, a new paradigm in 3D segmentation that goes beyond traditional category-specific methods. We propose a baseline method, Reasoning3D, that leverages pre-trained 2D segmentation networks powered by Large Language Models (LLMs) to interpret user queries and segment 3D meshes with contextual awareness. This approach enables fine-grained part segmentation and generates natural language explanations without requiring extensive 3D datasets. Experiments demonstrate that Reasoning3D can effectively localize and highlight parts of 3D objects. Our training-free method allows rapid deployment and serves as a universal baseline for future research in various fields such as robotics, object manipulation, autonomous driving, AR/VR, and medical applications. The code and the user interface have been released publicly.

Submission Number: 1

Loading