Online Optimization for Offline Safe Reinforcement Learning

Yassine Chemingui; Aryan Deshwal; Alan Fern; Thanh Nguyen-Tang; Jana Doppa

Online Optimization for Offline Safe Reinforcement Learning

Yassine Chemingui, Aryan Deshwal, Alan Fern, Thanh Nguyen-Tang, Jana Doppa

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: Sequential Decision Making, Offline Safe RL, Minimax Optimization, No-Regret Algorithms

TL;DR: Creating safe and reward maximization policies from offline data via min-max optimization formulation and solving it using no-regret algorithms

Abstract: We study the problem of Offline Safe Reinforcement Learning (OSRL), where the goal is to learn a reward-maximizing policy from fixed data under a cumulative cost constraint. We propose a novel OSRL approach that frames the problem as a minimax objective and solves it by combining offline RL with online optimization algorithms. We prove the approximate optimality of this approach when integrated with an approximate offline RL oracle and no-regret online optimization. We also present a practical approximation that can be combined with any offline RL algorithm, eliminating the need for offline policy evaluation. Empirical results on the DSRL benchmark demonstrate that our method reliably enforces safety constraints under stringent cost budgets, while achieving high rewards. The code is available at https://github.com/yassineCh/O3SRL.

Supplementary Material: zip

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 23414

Loading