IFG: Internet-Scale Guidance for Functional Grasping Generation

Published: 21 May 2026, Last Modified: 21 May 2026ICRA 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: dexterous manipulation, simulation, deep learning, robotics
TL;DR: IFG bridges semantic understanding and geometric precision by using internet-scale VLMs to guide simulation-based grasp generation, enabling robust, functional, dexterous grasping in cluttered scenes without any manually collected human data.
Abstract: Large Vision Models trained on internet-scale data have demonstrated strong capabilities in segmenting and semantically understanding object parts, even in cluttered scenes. However, while these models can direct a robot toward the general region of an object, they lack the geometric understanding required to precisely control dexterous robotic hands for 3D grasping. To overcome this, our key insight is to leverage simulation with a force-closure grasping generation pipeline that understands local geometries of the hand and object in the scene. Because this pipeline is slow and requires ground-truth observations, the generated dataset is distilled into a diffusion model that can operate on camera point clouds. By combining the global semantic understanding of internet-scale models with the geometric precision of a simulation-based locally-aware force-closure, IFG achieves high-performance semantic grasping without any manually collected training data. For visualizations, please visit our website at https://ifgrasping.github.io/
Submission Number: 25
Loading