Vulnerability Analysis for Safe Reinforcement Learning in Cyber-Physical Systems

Shixiong Jiang, Mengyu Liu, Fanxin Kong

Published: 2024, Last Modified: 30 Sept 2024ICCPS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Safe reinforcement learning (RL) has been recently employed to train a control policy that maximizes the task reward while satisfying safety constraints in a simulated secure cyber-physical environment. However, the vulnerability of safe RL has been barely studied in an adversarial setting. We argue that understanding the safety vulnerability of learned control policies is essential to achieve true safety in the physical world. To fill this research gap, we first formally define the adversarial safe RL problem and show that the optimal policies are vulnerable under observation perturbations. Then, we propose novel safety violation attacks that induce unsafe behaviors by adversarial models trained using reversed safety constraints. Finally, both theoretically and experimentally, we show that our method is more effective in violating safety than existing adversarial RL works which just seek to decrease the task reward, instead of violating safety constraints.