Keywords: Robotic Grasping, Large Language Model
TL;DR: We adapt the reasoning ability in multi-modal Large Language Models for numerical predictions in robotic grasping tasks.
Abstract: Large language models (LLMs) have garnered increasing popularity owing to their remarkable reasoning capabilities. However, their primary utility within the field of robotics has predominantly been constrained to tasks related to manipulation planning, primarily due to their inherent text-based outputs. To overcome this limitation, this paper explores the potential of LLMs in the realm of numerical predictions in robotics, with a specific focus on the task of robotic grasping. We propose Reasoning Tuning, a novel approach that harnesses the extensive prior knowledge embedded within LLMs, optimizing them for tasks involving numerical prediction. This method empowers LLMs, notably with multi-modal capabilities, to generate precise numerical outputs, such as grasp poses for robot arms. The proposed method is extensively validated on the grasping benchmark and real-world grasping experiments, demonstrating that multi-modal LLMs can be adapted for numerical prediction tasks in robotics. This not only extends their applicability but also bridges the gap between text-based planning and direct robot control utilizing LLMs.
Submission Number: 22
Loading