Rectified Robust Policy Optimization for Robust Constrained Reinforcement Learning without Strong Duality

20 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY-ND 4.0
TL;DR: The strong duality for constrained robust RL doesn't hold. We use a primal-only algorithm to address it.
Abstract: Robust constrained reinforcement learning (RL) seeks to optimize an agent's performance under model uncertainties while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does not generally hold in robust constrained RL, indicating that traditional primal-dual methods may fail to find optimal feasible policies. To overcome this limitation, we propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO), which operates directly on the primal problem without relying on dual formulations. We provide theoretical convergence guarantees for RRPO, showing that it converges to an approximately optimal policy that satisfies the constraints within a specified tolerance. Empirical results in a grid-world environment validate the effectiveness of our approach, demonstrating that RRPO achieves robust and safe performance under model uncertainties while the non-robust method will violate the worst-case safety constraints.
Primary Area: Theory->Reinforcement Learning and Planning
Keywords: reinforcement learning, robust reinforcement learning, constrained reinforcement learning
Submission Number: 3319
Loading