Goal-conditioned Reinforcement Learning with Subgoals Generated from Relabeling

Xing Lei; Zifeng Zhuang; Xuetao Zhang; Donglin Wang

Goal-conditioned Reinforcement Learning with Subgoals Generated from Relabeling

Xing Lei, Zifeng Zhuang, Xuetao Zhang, Donglin Wang

13 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Goal-Conditioned Reinforcement Learning, Hindsight Experience Replay, Subgoal-based Approach

TL;DR: This paper introduces a novel subgoal-based framework for goal-conditioned reinforcement learning, where subgoals are generated through relabeling.

Abstract: In goal-conditioned reinforcement learning (RL), the primary objective is to develop a goal-conditioned policy capable of reaching diverse desired goals, a process often hindered by sparse reward signals. To address the challenges associated with sparse rewards, existing approaches frequently employ hindsight relabeling, substituting original goals with achieved goals. However, these methods exhibit a tendency to prioritize the optimization of closer achieved goals during training, leading to the loss of potentially valuable information from the trajectory and low sample efficiency. Our key insight is that these achieved goals, generated from the same hindsight relabeling, can serve as effective subgoals to facilitate the learning of policies that reach possible long-horizon desired goals within the same trajectory. Leveraging this perspective, we propose a novel framework called Goal-Conditioned reinforcement learning with Q-BC (i.e, behavior cloning (BC)-regularized Q) and Subgoals (GCQS) for goal-conditioned RL. GCQS is a innovative goal-conditioned actor-critic framework that systematically exploits more trajectory information to improve policy learning and sample efficiency. Specifically, GCQS initially optimizes a Q-BC objective to facilitate learning policies that reach achieved goals effectively. Subsequently, these achieved goals are redefined as subgoals, which serve to enhance the goal-conditioned policies, thereby predicting better actions to reach the desired goals. Experimental results in simulated robotics environments demonstrate that GCQS significantly enhances sample efficiency and overall performance compared to existing goal-conditioned methods. Additionally, GCQS demonstrated competitive performance on long-horizon AntMaze tasks, achieving results comparable to such state-of-the-art subgoal-based methods.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 177

Loading