Goal-conditioned Reinforcement Learning with Subgoals Generated from Relabeling

ICLR 2025 Conference Submission177 Authors

13 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Goal-Conditioned Reinforcement Learning, Hindsight Experience Replay, Subgoal-based Approach
TL;DR: This paper introduces a novel subgoal-based framework for goal-conditioned reinforcement learning, where subgoals are generated through relabeling.
Abstract: In goal-conditioned reinforcement learning (RL), the primary objective is to develop a goal-conditioned policy capable of reaching diverse desired goals, a process often hindered by sparse reward signals. To address the challenges associated with sparse rewards, existing approaches frequently employ hindsight relabeling, substituting original goals with achieved goals. However, these methods exhibit a tendency to prioritize the optimization of closer achieved goals during training, leading to the loss of potentially valuable information from the trajectory and low sample efficiency. Our key insight is that these achieved goals, generated from the same hindsight relabeling, can serve as effective subgoals to facilitate the learning of policies that reach possible long-horizon desired goals within the same trajectory. Leveraging this perspective, we propose a novel framework called Goal-Conditioned reinforcement learning with Q-BC (i.e, behavior cloning (BC)-regularized Q) and Subgoals (GCQS) for goal-conditioned RL. GCQS is a innovative goal-conditioned actor-critic framework that systematically exploits more trajectory information to improve policy learning and sample efficiency. Specifically, GCQS initially optimizes a Q-BC objective to facilitate learning policies that reach achieved goals effectively. Subsequently, these achieved goals are redefined as subgoals, which serve to enhance the goal-conditioned policies, thereby predicting better actions to reach the desired goals. Experimental results in simulated robotics environments demonstrate that GCQS significantly enhances sample efficiency and overall performance compared to existing goal-conditioned methods. Additionally, GCQS demonstrated competitive performance on long-horizon AntMaze tasks, achieving results comparable to such state-of-the-art subgoal-based methods.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 177
Loading