Embedding Safety into RL: A New Take on Trust Region Methods

Published: 17 Jul 2025, Last Modified: 06 Sept 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Constrained MDP, Information Geometry, Safe RL
TL;DR: We introduce constrained trust region policy optimization (C-TRPO), which includes the constraints into the policy geometry and is an extension of TRPO to constrained MDPs.
Abstract: Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Johannes_Müller1
Track: Fast Track: published work
Publication Link: johannes.christoph.mueller@tu-berlin.de
Submission Number: 63
Loading