Safe Reinforcement Learning Framework Under a Linear Programming Formulation

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement learning, safe RL, linear optimisation, constrained optimisation
Abstract: Safe reinforcement learning (RL) is essential for deploying agents in safety-critical domains, yet many existing approaches based on Lagrangian relaxations suffer from drawbacks such as instability and hyperparameter sensitivity. We propose an alternative perspective that formulates constrained RL as a linear program over the state–action occupancy measure, enabling convex optimization of expected returns subject to linear constraints. This framework allows a wide range of safety constraints to be incorporated in a natural and modular manner while preserving interpretability. We develop a stochastic primal–dual algorithm based on this framework and empirical results demonstrate that our approach achieves near-optimal returns while consistently respecting safety constraints, exhibiting greater robustness to hyperparameter variations compared to standard baselines. These findings establish a promising direction for robust and reliable safe RL.
Submission Number: 368
Loading