Harmonic Constrained Reinforcement Learning

ICLR 2026 Conference Submission13236 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Constraint Reinforcement Learning, Gradient Conflict
TL;DR: We introduce a harmonic constrained reinforcement learning (HCRL) framework, designed to resolve conflicts between reward and safety at the gradient level in an optimal manner.
Abstract: Constrained reinforcement learning (CRL) aims to train agents that maximize rewards while satisfying safety constraints, an essential requirement for real-world application. Despite extensive progress in using various constrained optimization techniques, striking a stable balance between reward maximization and constraint satisfaction remains a challenge. Reward-driven updates often violate constraints, while overly safety-driven updates degrade performance. To address this conflict, we propose harmonic constrained reinforcement learning (HCRL), a framework that resolves reward-safety trade-offs at the gradient level in an optimal manner. At each iteration, HCRL formulates a trust-region minimax optimization problem to compute a harmonic gradient (HG) for the policy update. This gradient has minimal conflict with both the reward and safety objective gradients, thereby enabling more stable and balanced policy learning. In practice, we can equivalently convert this challenging constrained minimax problem for solving HG as an unconstrained single-variable optimization problem, maintaining high time-efficiency. Empirical results on three planar constrained optimization problems and ten Safety Gymnasium tasks demonstrate that HCRL consistently outperforms existing CRL baselines in terms of stability and the ability to find feasible and optimal policies.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 13236
Loading