LARG2, Language-based Automatic Reward and Goal Generation

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Robots Learning, Goal Conditioned Reward Learning, MTRL, Reward shapping, Large Language Models, Code generation, Chain-of-Thought
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Large Language Models for automatic goal and reward function generation from textual task descriptions for GCRL and MTRL settings.
Abstract: Robotic tasks currently addressed with reinforcement learning such as locomotion, navigation, and manipulation are challenged with the problem of defining reward functions to maximize and goals to reach. Alternative methodologies, like imitation learning, often require labor-intensive human annotations to produce datasets of task descriptions associated with trajectories. As a response, this paper introduces "Language-based Automatic Reward and Goal Generation" (LARG), a framework that harnesses code generation capabilities of LLMs to enables the conversion of text-based task descriptions into corresponding reward and goal-generation functions. We leverages Chain-of-thought mechanisms and the common-sense knowledge embedded in Large Language Models (LLMs) for this purpose. It is complemented by automatic error discovery and correction mechanisms. We validate the effectiveness of LARG by conducting extensive experiments in the context of robotic manipulation demonstrating its ability to train and execute without human annotation of any kind.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5460
Loading