Keywords: Vision-Language-Action model, robotic manipulation, contact-rich manipulation, manipulation planning, robot learning
TL;DR: Contact-VLA is a modular framework that integrates vision-based scene modeling, LLM-driven strategy generation, and dynamic planning to enable zero-shot adaptive manipulation in contact-rich tasks.
Abstract: Vision-Language-Action (VLA) systems often lack adaptability and explainability due to their black-box structure and dependency on fixed action sets from extensive tele-operated datasets, limiting their effectiveness in complex, dynamic manipulation scenarios. To address this issue, we propose Contact-Rich Adaptive LLM-based Control (CoRAL), a novel modular framework capable of effectively managing complex, dynamic, and contact-rich manipulation tasks. By integrating foundational vision and language models with motion planning and reactive controllers, our system achieves zero-shot planning and adaptive manipulation without relying on extensive tele-operated action datasets. Unlike conventional VLAs, we explicitly separate the roles of vision models and Large Language Models (LLM): the vision module handles environmental parameter initialization and object pose tracking, while the LLM generates initial contact strategies and cost function estimations. This collaboration establishes a physical understanding of the scene, instantiated as a dynamic planning world model for our planner. Additionally, this modular approach significantly enhances both the explainability and performance of the overall framework, as demonstrated by ablation studies. Furthermore, we introduce a memory unit to leverage past manipulation experiences, enabling the generalization and efficient reuse of learned contact strategies and parameter adjustments across diverse manipulation scenarios. Experiments conducted on challenging contact-rich tasks validate our framework's robustness and highlight the critical design elements that contribute to its effectiveness.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 25221
Loading