AEGIS: Almost Surely Safe Offline Reinforcement Learning

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safe Reinforcement Learning, Offline Reinforcement Learning, Diffusion Models
TL;DR: We guide diffusion policies in offline safe RL using almost-sure feasibility critics, enabling safe control for any feasible budget.
Abstract: Ensuring safety guarantees in offline reinforcement learning remains challenging, especially when safety constraints must hold almost surely, i.e., along every possible trajectory. Moreover, as pre-specifying a single safety budget (constraint threshold) is often challenging, it is desirable to learn a foundation policy that can be deployed across a broad range of budgets. We introduce AEGIS (Almost-Sure Epigraph-Guided Implicit Safety), an almost surely safe offline RL framework that can guide diffusion policy training via critics that respect constraints across all feasible budgets. AEGIS characterizes the feasible set of initial state-budget pairs as the epigraph of a feasibility critic updated via the worst-case backup. Building on the proposed characterization, we extend Implicit Q-Learning (IQL) to train both feasibility and reward critics. We use these critics to bias a diffusion policy toward high-value feasible actions. Consequently, AEGIS turns diffusion from a generative prior into a safety-aware controller, enabling a single general policy to respect various budgets without further tuning. Empirical results on the DSRL benchmark and humanoid locomotion tasks show that AEGIS achieves high feasibility with competitive returns, generalizing across feasible constraint thresholds.
Primary Area: reinforcement learning
Submission Number: 23775
Loading