TIGR: a Mixture-of-Foundation-Model-Experts for 3D-informed Task-Aware Grasping

Published: 27 May 2026, Last Modified: 27 May 2026ICRA 2026 SRRA Workshop LightningTalkPosterEveryoneRevisionsCC BY 4.0
Keywords: Foundation Models, Mixture of experts, task aware grasping, 3d segmentation
TL;DR: TIGR combines 3D reconstruction with foundation-model fusion to enable task-aware grasping of unknown objects, outperforming baselines across 809 real-robot trials.
Abstract: Task-aware grasping of unknown objects involves interpreting natural-language instructions to identify the correct target object within a scene, localizing the task-relevant part, and generating collision-aware grasps from both observable and occluded approach directions. We present \textbf{TIGR} (Task-aware Intelligent Grasping for dexterous Robots), a pipeline for task-aware grasping. It combines 3D reconstruction with a mixture of foundation-model experts whose predictions across rendered viewpoints of the reconstruction are fused into a unified 3D representation. This representation lets TIGR recover functional parts beyond the camera's field of view and generate grasps from all collision free approach directions. We evaluate on a real-robot benchmark of 20 everyday objects, each paired with 2–3 task formulations and tested across 5 viewpoints, totaling 809 trials across TIGR, GraspMolmo, and ShapeGrasp. TIGR outperformed GraspMolmo and ShapeGrasp, achieving 97.8\% object-identification accuracy, 82.6\% task-relevant segmentation success, and 52.6\% grasp success, with at least one successful grasp on every object in the set.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 36
Loading