Keywords: Adversarial Reinforcement Learning, Online Model Predictive Shielding, Hamilton Jacobi Reachability Analysis
TL;DR: We propose Iterative Soft Adversarial Actor-Critic for Safety (ISAACS), which consists of an offline uncertainty-aware safety policy training stage and an online robust model-based verification stage.
Abstract: The deployment of robots in uncontrolled environments requires them to operate robustly un- der previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable “deep” methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error, by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial “disturbance” agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer’s uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter with robust safety guarantees based on forward reachability rollouts. This safety filter can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety filter against worst-case model discrepancy. See https://saferoboticslab.github.io/ISAACS/ for supplementary material.
0 Replies
Loading