Feasible Constraint Policy Optimization for Safe Reinforcement Learning

Published: 19 Dec 2025, Last Modified: 27 Dec 2025AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Safe Reinforcement Learning, LEARN
Abstract: Safe reinforcement learning (RL) is crucial for policies conforming to explicit constraints in safety-critical applications. However, existing primal-dual methods exhibit inherent instability. Trust region-based approaches, due to initialization and approximation errors, often yield infeasible policies during training. We introduce Feasible Constraint Policy Optimization (FCPO), which seamlessly combines penalty and trust region methods to address policy feasibility while ensuring stability and performance. FCPO efficiently decomposes optimization problems with the Alternating Direction Multiplier Method (ADMM), enabling efficient optimization through the utilization of first-order degree information. Comprehensive experiments showcase FCPO's consistent superiority, outperforming the baselines in both performance and constraint satisfaction across the majority of tasks.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 35
Loading