Autotelic LLM-based exploration for goal-conditioned RL

Guillaume Pourcel; Thomas Carta; Grgur Kovač; Pierre-Yves Oudeyer

Autotelic LLM-based exploration for goal-conditioned RL

Guillaume Pourcel, Thomas Carta, Grgur Kovač, Pierre-Yves Oudeyer

Published: 09 Oct 2024, Last Modified: 02 Dec 2024NeurIPS 2024 Workshop IMOL asTinyPaperPosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Full track

Keywords: Autotelic, Goal-generation, Goal-conditioned RL, Open-endedness

TL;DR: Using LLMs to generate goals in code for a goal-conditioned RL learner in an open-world.

Abstract: Autotelic agents, capable of autonomously generating and pursuing their own goals, a represent promising approach to open-ended learning and skill acquisition in reinforcement learning. Such agents learn to set and pursue their own goals. This challenge is even more difficult in open worlds that require inventing new previously unobserved goals. In this work, we propose an architecture where a single generalist autotelic agent is trained on an automatic curriculum of goals. We leverage a large language models (LLMs) to generate goals as code for reward functions based on learnability and difficulty estimates. The goal-conditioned RL agent is trained on those goals sampled based on learning progress. We compare our method to an adaptation of OMNI-EPIC to goal-conditioned RL. Our preliminary experiments imply that our method generates a higher proportion of learnable goals, suggesting better adaptation to the goal-conditioned learner.

Submission Number: 52

Loading