Track: Research Track
Keywords: Self-supervised RL, Goal-conditioned RL, Unsupervised goal sampling
Abstract: Goal-conditioned reinforcement learning (GCRL) enables agents to learn to achieve different goals. However, it is often limited by the need for a pre-defined goal sampling distribution. Prior works attempted to lift this limitation by training additional components that propose goals for the agent at the frontier of its capabilities, by fitting goal coverage density estimators, or by training additional goal samplers. These approaches are difficult to scale for relatively higher-dimensional goals due to the additional challenge of modeling or sampling high-dimensional variables. To address this problem, we introduce Unsupervised Contrastive Goal Reaching (UCGR), a simple algorithm that enables the agent to propose its own training goals without the need for additional networks or density estimators. UCGR leverages the learned critic in the contrastive reinforcement learning framework as an implicit dynamics-aware model of reachability. Our experiments show that UCGR outperforms strong prior methods in a variety of tasks, particularly when goals are complex and high-dimensional
Submission Number: 141
Loading