Keywords: Safe Control, HJ Reachability, Value Function, Control Barrier Function
TLDR: We propose a framework to learn an Control Barrier Function from offline demonstrations using Value functions to address hard safety constraints.
Abstract: Deploying autonomous systems in safety-critical domains requires hard, state-wise safety guarantees. Most offline RL methods only enforce soft constraints, allowing non-zero violation risk and struggling with high dimensional systems or complex actuation limits. We introduce Value-Guided Offline Control Barrier Function (V-OCBF), a framework that inspires from Hamilton-Jacobi Reachability to learn a forward-invariant, actuation-aware Neural CBF from offline demonstrations. Our pipeline shows an optimal control barrier function that identifies the maximal safe set which can be certified as a valid Neural CBF, encouraging hard safety that respects control limits without risky online interaction. We first validate V-OCBF on a simple Dubin's Car and then evaluate it on several high-dimensional gymnasium tasks (HalfCheetah, Hopper, Swimmer, Ant, Walker2D), where it consistently outperforms prior methods in safety and effectiveness.
Submission Number: 25
Loading