Accelerating Goal-Conditioned RL Algorithms and Research

Published: 28 Feb 2025, Last Modified: 27 Mar 2025WRL@ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: full paper
Keywords: Reinforcement Learning, Goal-conditioned Reinforcement learning, Unsupervised Reinforcement learning, Contrastive Learning, Benchmark
TL;DR: This paper presents JaxGCRL, a high-performance codebase and benchmark designed for self-supervised goal-conditioned reinforcement learning, offering faster training and promoting more efficient RL research.
Abstract: Self-supervision has the potential to transform reinforcement learning (RL), paralleling the breakthroughs it has enabled in other areas of machine learning. While self-supervised learning in other domains aims to find patterns in a fixed dataset, self-supervised goal-conditioned reinforcement learning (GCRL) agents discover *new* behaviors by learning from the goals achieved during unstructured interaction with the environment. However, these methods have failed to see similar success, both due to a lack of data from slow environment simulations as well as a lack of stable algorithms. We take a step toward addressing both of these issues by releasing a high-performance codebase and benchmark (`JaxGCRL`) for self-supervised GCRL, enabling researchers to train agents for millions of environment steps in minutes on a single GPU. By utilizing GPU-accelerated replay buffers, environments, and a stable contrastive RL algorithm, we reduce training time by up to $22\times$. Additionally, we assess key design choices in contrastive RL, identifying those that most effectively stabilize and enhance training performance. With this approach, we provide a foundation for future research in self-supervised GCRL, enabling researchers to quickly iterate on new ideas and evaluate them in diverse and challenging environments. Code: [https://anonymous.4open.science/r/JaxGCRL-2316/README.md](https://anonymous.4open.science/r/JaxGCRL-2316/README.md)
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Michał_Bortkiewicz1
Format: Yes, the presenting author will definitely attend in person because they are attending ICLR for other complementary reasons.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 11
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview