FLASH: Flow-Based Language-Annotated Grasp Synthesis for Dexterous Hands

Published: 19 Sept 2025, Last Modified: 19 Sept 2025CoRL 2025 Workshop Dexterous Manipulation SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dexterous Grasping, Flow Matching, Large Language Models
TL;DR: We introduce FLASH, a method for language-conditioned dexterous grasping that jointly models task intent and physical contact quality for robot hands.
Abstract: We introduce FLASH, a method for language-conditioned dexterous grasping that jointly models task intent and physical contact quality for robot hands. Unlike prior approaches, our text-conditioned grasp synthesis pipeline is explicitly aware of geometric information during generation. FLASH learns a single flow-matching model conditioned on hand and object point clouds and natural language instructions. Our model operates on live-updated, vectorized hand meshes and is trained on our improved grasp dataset, FLASH-drive, which includes refined grasps, water-tight meshes and augmented text annotations. This enables FLASH to outperform prior work in producing physically plausible grasps that align with goals specified via text. We use a pre-trained large language model as the backbone of our architecture, enabling generalization to novel prompts and objects.
Submission Number: 10
Loading