A Single Goal is All You Need

Published: 09 Oct 2024, Last Modified: 02 Dec 2024NeurIPS 2024 Workshop IMOL PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Full track
Keywords: contrastive RL, goal-conditioned RL, emergence exploration, skill learning
TL;DR: We present empirical evidence of skills and directed exploration emerging from a simple RL algorithm without the aid of rewards, demonstrations, or subgoals.
Abstract: In this paper, we present empirical evidence of skills and directed exploration emerging from a simple RL algorithm long before any successful trials are observed. For example, in a manipulation task, the agent is given a single observation of the goal state and learns skills, first moving its end-effector, then pushing the block, and finally lifting and placing the block. These skills emerge before the agent has ever successfully placed the block at the goal location and without the aid of any reward functions, demonstrations, or manually-specified distance metrics. Implementing our method involves a simple modification of prior work and does not require density estimates, ensembles, or any additional hyperparameters. We lack a clear theoretical understanding of why the method works so effectively, though our experiments provide some hints.
Submission Number: 5
Loading