Keywords: Dexterous Grasping, Flow Matching, Large Language Models
TL;DR: We introduce FLASH, a method for language-conditioned dexterous grasping that jointly models task intent and physical contact quality for robot hands.
Abstract: We introduce FLASH, a method for language-conditioned dexterous grasping that jointly models task intent and physical contact quality for robot hands. Unlike prior approaches, our text-conditioned grasp synthesis pipeline is explicitly aware of geometric information during generation. FLASH learns a single flow-matching model conditioned on hand and object point clouds and natural language instructions. Our model operates on live-updated, vectorized hand meshes and is trained on our improved grasp dataset, FLASH-drive, which includes refined grasps, water-tight meshes and augmented text annotations. This enables FLASH to outperform prior work in producing physically plausible grasps that align with goals specified via text. We use a pre-trained large language model as the backbone of our architecture, enabling generalization to novel prompts and objects.
Submission Number: 10
Loading