Keywords: Goal-Conditioned Reinforcement Learning, Data Augmentation, Stitching Property
TL;DR: This paper introduces the GCReinSL framework, which enhances SL-based methods with trajectory stitching capabilities by embedding Q-function maximization in goal-conditioned RL.
Abstract: Recent research highlights the efficacy of supervised learning (SL) as a methodology within reinforcement learning (RL), yielding commendable results. Nonetheless, investigations reveal that SL-based methods lack the stitching capability typically associated with RL approaches such as TD learning, which facilitate the resolution of tasks by stitching diverse trajectory segments. This prompts the question: How can SL methods be endowed with stitching property and bridge the gap with TD learning? This paper addresses this challenge by exploring the maximization of the objective in the goal-conditioned RL. We introduce the concept of Q-conditioned maximization supervised learning, grounded in the assertion that the goal-conditioned RL objective is equivalent to the Q-function, thus embedding Q-function maximization into traditional SL-based methodologies. Building upon this premise, we propose Goal-Conditioned Reinforced Supervised Learning (GCReinSL), which enhances SL-based approaches by incorporating maximize Q-function. GCReinSL emphasizes the maximization of the Q-function during the training phase to estimate the maximum expected return within the distribution, subsequently guiding optimal action selection during the inference process. We demonstrate that GCReinSL enables SL methods to exhibit stitching property, effectively equivalent to applying goal data augmentation to SL methods. Experimental results on offline datasets designed to evaluate stitching capability show that our approach not only effectively selects appropriate goals across diverse trajectories but also outperforms previous works that applied goal data augmentation to SL methods.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4106
Loading