Instruct2Subtask: A Language Parsing Framework for Sequential Robotic Manipulation

16 Nov 2025 (modified: 29 Dec 2025)ICC 2025 Workshop RAS SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision Language Models, Robotic Manipulation, Task and Motion Planning
Abstract: We present a modular robotic manipulation framework, robot agnostic in nature that executes long-horizon tasks specified through natural language instruction. The system integrates a Vision-Language Supervisory Planner (VLM-SP), a Grasp-Pose Estimator (GPE), and a structured skill repository containing the robot’s executable skills. Given a natural language instruction, the VLM planner decomposes the task into grounded subtasks-an essential step for reducing planning complexity, enabling skill reuse, and ensuring robust execution. Each subtask is then mapped to appropriate skills and paired with object specific grasp predictions for reliable manipulation.
Submission Number: 9
Loading