Low-Switching Primal-Dual Algorithms for Safe Reinforcement Learning

Chang Liu; Yunfan Li; Lin Yang

Low-Switching Primal-Dual Algorithms for Safe Reinforcement Learning

Chang Liu, Yunfan Li, Lin Yang

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, Markov decision process, constrained Markov decision process, machine learning, online learning, optimization

Abstract: Safety is a key challenge in reinforcement learning (RL), especially in real-world applications like autonomous driving and healthcare. To address this, Constrained Markov Decision Processes (CMDPs) are commonly used to incorporate safety constraints while optimizing performance. However, current methods often face significant safety violations during exploration or suffer from high regret, which represents the performance loss compared to an optimal policy. We propose a low-switching primal-dual algorithm that balances regret with bounded constraint violations, drawing on techniques from online learning and CMDPs. Our approach minimizes policy changes through low-switching updates and enhances sample efficiency using empirical Bernstein-based bonuses. This leads to tighter theoretical bounds on regret and safety, achieving a state-of-the-art regret of $\tilde{O}(\sqrt{SAH^5K}/(\tau - c^0))$, where $S$ and $A$ is the number of states and actions, $H$ is the horizon, $K$ is the number of episodes, and $(\tau - c^0)$ reflects the safety margin of a known existing safe policy. Our method also ensures a $\tilde{O}(1)$ constraint violation and removes unnecessary dependencies on state space $S$ and planning horizon $H$ in the reward regret, offering a scalable solution for constrained RL in complex environments.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13313

Loading