iGrasp: An Interactive 2D-3D Framework for 6-DoF Grasp Detection

Jian-Jian Jiang, Xiao-Ming Wu, Zibo Chen, Yi-Lin Wei, Wei-Shi Zheng

Published: 2024, Last Modified: 05 Mar 2025ICPR (30) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: For 6-DoF grasp detection, we aim at introducing a new interactive 2D-3D framework which filters out irrelevant information and makes both modalities collaborate effectively to generate robust grasps and accelerate inference speed greatly. This cannot be accomplished by existing works on learning to grasp that merely utilize 3D point clouds or leverage both 2D textures and 3D point clouds. Our framework is called iGrasp, a novel three-step design between 2D textures and 3D point clouds, where the interaction modelling enhances both modalities. Concretely, we propose the 2D-to-3D interaction to leverage objectness masks generated from 2D textures to filter out target-irrelevant information in 3D point clouds. Then, we introduce the 3D-to-2D interaction to leverage structural priors from 3D point cloud features with cross-attention and cylinder grouping to refine 2D texture features. Finally, we combine the refined 2D texture features and 3D point cloud features for generating high-quality 6-DoF grasp poses. Our experiments on the large-scale real-world dataset, namely GraspNet-1Billion, demonstrate that iGrasp surpasses state-of-the-art methods by 4.66/3.53 mAP on RealSense/Kinect and reduces the inference time by 28%. Real-world experiments further verify the effectiveness of iGrasp.