IFG: Internet-Scale Guidance for Functional Grasping Generation

Published: 01 Jun 2026, Last Modified: 01 Jun 2026IEEE ICRA 2026 Workshop Xplore OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: dexterous manipulation, simulation, deep learning, robotics
TL;DR: IFG bridges semantic understanding and geometric precision by using internet-scale VLMs to guide simulation-based grasp generation, enabling robust, functional, dexterous grasping in cluttered scenes without any manually collected human data.
Abstract: Large Vision Models trained on internet-scale data have demonstrated strong capabilities in segmenting and semantically understanding object parts, even in cluttered scenes. However, while these models can direct a robot toward the general region of an object, they lack the geometric understanding required to precisely control dexterous robotic hands for 3D grasping. To overcome this, our key insight is to leverage simulation with a force-closure grasping generation pipeline that understands local geometries of the hand and object in the scene. Because this pipeline is slow and requires ground-truth observations, the generated dataset is distilled into a diffusion model that can operate on camera point clouds. By combining the global semantic understanding of internet-scale models with the geometric precision of a simulation-based locally-aware force-closure, IFG achieves high-performance semantic grasping without any manually collected training data. For visualizations, please visit our website at https://ifgrasping.github.io/
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 4
Loading