Bridging Safety and Performance in Autonomous Systems using Offline Reinforcement Learning
Keywords: Safe reinforcement learning, offline reinforcement learning, flow matching
Abstract: Offline reinforcement learning (RL) offers a promising framework for deploying autonomous systems in safety-critical settings without the risks of online exploration. However, learning policies that simultaneously achieve high performance and strong safety guarantees from fixed datasets remains a fundamental challenge. Many existing safe offline RL approaches typically rely on soft constraint formulations, which may permit safety violations and are sensitive to distributional shift. In contrast, formal methods such as Hamilton–Jacobi (HJ) reachability and Control Barrier Functions (CBFs) provide rigorous safety guarantees, but often yield overly conservative solutions, often neglecting performance. In this work, we bridge this gap by formulating safe offline RL as a state-constrained optimal control problem, where safety is enforced through hard state constraints and performance is captured via a reward function. The resulting value function satisfies a Hamilton–Jacobi–Bellman (HJB) equation, which we approximate using offline RL on fixed datasets. This formulation enables principled integration of safety guarantees with data-driven policy optimization. Empirically, across safety-critical benchmarks including boat navigation and Safety-Gymnasium tasks, our approach achieves competitive returns while exhibiting near-zero constraint violations, demonstrating a favorable balance between safety and performance in the offline setting.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 161
Loading