An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning

Lindsay Spoor; Alvaro Serra-Gomez; Aske Plaat; Thomas M. Moerland

An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning

Lindsay Spoor, Alvaro Serra-Gomez, Aske Plaat, Thomas M. Moerland

Published: 21 Nov 2025, Last Modified: 21 Nov 2025DiffSys 2025EveryoneRevisionsCC BY 4.0

Keywords: Safe Reinforcement Learning, Lagrangian Methods, Constrained Reinforcement Learning, Reinforcement Learning

Abstract: In safety-critical domains such as robotics and navigation, agents must balance performance with safety constraints. Safe reinforcement learning (SRL) addresses this challenge, with Lagrangian methods being a widely used approach. Their effectiveness, however, depends strongly on the choice of the Lagrange multiplier λ, which governs the trade-off between return and constraint cost. We analyze (i) optimality and (ii) stability of Lagrange multipliers in SRL across multiple benchmark tasks. By constructing λ-profiles, we visualize the sensitivity of performance to λ and show that automated updates of λ can recover or even surpass the performance of optimally tuned fixed multipliers. While PID-controlled updates can reduce oscillations, this method requires careful tuning, emphasizing the need for more robust stabilization strategies for Lagrangian methods in SRL. Code for reproducing our results is available at https://github.com/lindsayspoor/Lagrangian_SafeRL.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 11

Loading