Learning to Distinguish: Behavior Gap Optimization for Goal-Conditioned Policy Learning

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Goal-conditioned reinforcement learning, Behavior Gap, DDPG, reinforcement learning
TL;DR: BG2RL trains smarter goal-conditioned policies by forcing clearer distinctions between right and wrong actions, proven to work better in hard tasks.
Abstract: Goal-conditioned reinforcement learning (GCRL) trains agents to accomplish a wide variety of tasks by optimizing goal-conditioned policies to achieve desired goals. However, a critical challenge in GCRL is the insufficient separation between the value estimates of optimal and suboptimal actions, a phenomenon we refer to as the Insufficient Behavior Gap, which can significantly degrade policy performance. To address this issue, we propose Behavior Gap Optimization Goal-Conditioned RL (BG2RL), a method that explicitly maximizes this gap through a contrastive optimization framework. Specifically, BG2RL samples reachable future states as target goals, which are considered positive examples, and strategically selects challenging, unachieved states from other trajectories as non-target goals, regarded as negative examples. By maximizing the value disparity between actions leading to these distinct outcomes, BG2RL learns a more discriminative value function and a more robust policy. Theoretical analysis shows that enlarging the policy gap between target and non-target goals directly tightens the suboptimality bound, providing a formal guarantee for the effectiveness of our contrastive objective. Finally, extensive experiments on challenging MuJoCo-based robotic manipulation tasks demonstrate that BG2RL significantly outperforms existing GCRL baselines in terms of success rate and exhibits more stable performance in environments with added obstacles, validating its robustness for goal-directed policy learning.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 10875
Loading