Embodied large language models enable robots to complete complex tasks in unpredictable environments

Published: 01 Jan 2025, Last Modified: 14 May 2025Nat. Mac. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Completing complex tasks in unpredictable settings challenges robotic systems, requiring a step change in machine intelligence. Sensorimotor abilities are considered integral to human intelligence. Thus, biologically inspired machine intelligence might usefully combine artificial intelligence with robotic sensorimotor capabilities. Here we report an embodied large-language-model-enabled robot (ELLMER) framework, utilizing GPT-4 and a retrieval-augmented generation infrastructure, to enable robots to complete long-horizon tasks in unpredictable settings. The method extracts contextually relevant examples from a knowledge base, producing action plans that incorporate force and visual feedback and enabling adaptation to changing conditions. We tested ELLMER on a robot tasked with coffee making and plate decoration; these tasks consist of a sequence of sub-tasks from drawer opening to pouring, each benefiting from distinct feedback types and methods. We show that the ELLMER framework allows the robot to complete the tasks. This demonstration marks progress towards scalable, efficient and ‘intelligent robots’ able to complete complex tasks in uncertain environments. To function in the real world, autonomous robots will have to respond to unanticipated situations. A vision-language-model-based approach is proposed to solve long-horizon robotic tasks, which can adapt to a dynamic environment.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview