Improving the Continuity of Goal-Achievement Ability via Policy Self-Regularization for Goal-Conditioned Reinforcement Learning

Xudong Gong; Sen Yang; Feng Dawei; Kele Xu; Bo Ding; Huaimin Wang; Yong Dou

Improving the Continuity of Goal-Achievement Ability via Policy Self-Regularization for Goal-Conditioned Reinforcement Learning

Xudong Gong, Sen Yang, Feng Dawei, Kele Xu, Bo Ding, Huaimin Wang, Yong Dou

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a margin-based policy self-regularization approach to improve the continuity of goal-achievement ability for goal-conditioned reinforcement learning.

Abstract: This paper addresses the challenge of discontinuity in goal-achievement capabilities observed in Goal-conditioned Reinforcement Learning (GCRL) algorithms. Through a theoretical analysis, we identify that the reuse of successful trajectories or policies during training can aid in achieving adjacent goals of achievable goals. However, the policy discrepancy between achievable and adjacent goals must be carefully managed to avoid both overly trivial and excessively large differences, which can respectively hinder policy performance. To tackle this issue, we propose a margin-based policy self-regularization approach that optimizes the policy discrepancies between adjacent desired goals to a minimal acceptable threshold. This method can be integrated into popular GCRL algorithms, such as GC-SAC, HER, and GC-PPO. Systematic evaluations across two robotic arm control tasks and a complex fixed-wing aircraft control task demonstrate that our approach significantly improves the continuity of goal-achievement abilities of GCRL algorithms, thereby enhancing their overall performance.

Lay Summary: Training RL agents to achieve different goals, like turning a aircraft by specific angles, often leads to unexpected gaps in performance. For example, an agent might succeed at a 30-degree turn but fail at 30.1 degrees—even though the goals are nearly identical. This inconsistency suggests that current methods don’t fully leverage past successes to handle similar, slightly different goals. Our research identifies why this happens and introduces a simple but effective solution: we ensure the RL’s policy for one goal is smoothly adjusted for nearby goals, avoiding both overly rigid and overly drastic changes. This approach, called Margin-Based Policy Self-Regularization (MSR), improves performance across tasks like robotic arm control and aircraft maneuvering, making RL agents more reliable and adaptable. By integrating MSR into existing algorithms, we demonstrate consistent success rates and better overall performance, bridging the gaps that previously hindered goal-conditioned RL algorithms.

Link To Code: https://github.com/GongXudong/fly-craft-examples

Primary Area: Reinforcement Learning->Policy Search

Keywords: Goal-Conditioned Reinforcement Learning, Policy Regularization, Continuity of Goal-Achievement Ability

Submission Number: 4209

Loading