Constrained Reinforcement Learning with Smoothed Log Barrier Function

Baohe Zhang; Yuan Zhang; Hao Zhu; Shengchao Yan; Thomas Brox; Joschka Boedecker

Constrained Reinforcement Learning with Smoothed Log Barrier Function

Baohe Zhang, Yuan Zhang, Hao Zhu, Shengchao Yan, Thomas Brox, Joschka Boedecker

Published: 03 Nov 2025, Last Modified: 03 Nov 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Deploying reinforcement learning (RL) in real-world systems often requires satisfying strict safety constraints during both training and deployment, which simple reward shaping typically fails to enforce. Existing constrained RL algorithms frequently face several major challenges, including instabilities during training and overly conservative policies. To overcome these limitations, we propose CSAC-LB (Constrained Soft Actor-Critic with Log Barrier), a model-free, sample-efficient, off-policy algorithm that requires no pre-training. CSAC-LB integrates a linear smoothed log barrier function into the actor’s objective, providing a numerically stable, non-vanishing gradient that enables the agent to quickly recover from unsafe states while avoiding the instability of traditional interior-point methods. To further enhance safety and mitigate the underestimation of constraint violations, we employ a pessimistic double-critic architecture for the cost function, taking the maximum of two cost Q-networks to conservatively guide the policy. Through extensive experiments on challenging constrained control tasks, we demonstrate that CSAC-LB significantly outperforms baselines by consistently achieving high returns while strictly adhering to safety constraints. Our results establish CSAC-LB as a robust and stable solution for applying RL to safety-critical domains.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/nrgrp/saferl-lib

Assigned Action Editor: ~Matteo_Papini1

Submission Number: 5434

Loading