Embedding Safety into RL: A New Take on Trust Region Methods

Nikola Milosevic; Johannes Müller; Nico Scherf

Embedding Safety into RL: A New Take on Trust Region Methods

Nikola Milosevic, Johannes Müller, Nico Scherf

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Reinforcement Learning (RL) agents can solve diverse tasks but often exhibit unsafe behavior. Constrained Markov Decision Processes (CMDPs) address this by enforcing safety constraints, yet existing methods either sacrifice reward maximization or allow unsafe training. We introduce Constrained Trust Region Policy Optimization (C-TRPO), which reshapes the policy space geometry to ensure trust regions contain only safe policies, guaranteeing constraint satisfaction throughout training. We analyze its theoretical properties and connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Experiments show that C-TRPO reduces constraint violations while maintaining competitive returns.

Lay Summary: Reinforcement learning (RL) is a type of AI that learns by trial and error, often achieving impressive results in games, robotics, and other tasks that require reasoning in multiple steps. But this trial-and-error process can lead to unsafe behavior while the system is still learning—like breaking rules or taking risky actions. Our work introduces a new method called Constrained Trust Region Policy Optimization (C-TRPO) that helps RL systems stay safe while learning without making any specific assumptions about the task. Instead of allowing the system to explore freely and hoping it stays within limits, C-TRPO carefully guides the learning process so that all new behaviors are safe by design. This means it avoids unsafe actions not just at the end, but throughout training. We also show how our method connects to other popular approaches and test it on several tasks. The results show that C-TRPO keeps the system within safety limits while still performing well.

Link To Code: https://github.com/milosen/ctrpo

Primary Area: Reinforcement Learning->Deep RL

Keywords: Constrained MDP, Information Geometry, Safe RL

Submission Number: 6813

Loading