LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements

Thomas Pouplin; Kasia Kobalczyk; Hao Sun; Mihaela van der Schaar

LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements

Thomas Pouplin, Kasia Kobalczyk, Hao Sun, Mihaela van der Schaar

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Language-conditioned policy, Offline policy learning, Decison Making Agent, Goals generalization, Domain generalization

TL;DR: We train LLM agents as language-conditioned policies without requiring expensive labeled data or online experimentation. The framework leverages LLMs to enable the use of unlabeled datasets and improve generalization to unseen goals and states.

Abstract: To develop autonomous agents capable of executing complex, multi-step decision-making tasks as specified by humans in natural language, existing reinforcement learning approaches typically require expensive labeled datasets or access to real-time experimentation. Moreover, conventional methods often face difficulties in generalizing to unseen goals and states, thereby limiting their practical applicability. This paper presents TEDUO, a novel training pipeline for offline language-conditioned policy learning. TEDUO operates on easy-to-obtain, unlabeled datasets and is suited for the so-called in-the-wild evaluation, wherein the agent encounters previously unseen goals and states. To address the challenges posed by such data and evaluation settings, our method leverages the prior knowledge and instruction-following capabilities of large language models (LLMs) to enhance the fidelity of pre-collected offline data and enable flexible generalization to new goals and states. Empirical results demonstrate that the dual role of LLMs in our framework—as data enhancers and generalizers—facilitates both effective and data-efficient learning of generalizable language-conditioned policies.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10568

Loading