Vulnerability Analysis of Safe Reinforcement Learning via Inverse Constrained Reinforcement Learning
Keywords: Safe reinforcement learning, inverse constrained reinforcement learning, adversarial attack, vulnerability analyses
TL;DR: Explore vulnerabilities of safe reinforcement learning applications under the minimal knowledge assumption via Inverse Constrained RL
Abstract: Safe reinforcement learning (Safe RL) aims to ensure policy performance while satisfying safety constraints. However, most existing Safe RL methods assume benign environments, making them vulnerable to adversarial perturbations commonly encountered in real-world settings. In addition, existing gradient-based adversarial attacks typically require access to the policy's gradient information, which is often impractical in real-world scenarios. To address these challenges, we propose a vulnerability analysis framework for Safe RL policies via inverse constrained reinforcement learning (ICRL). Our approach only requires a set of expert demonstrations to learn both the safety constraints and a learner policy, which are then used to generate adversarial attacks capable of inducing safety violations in Safe RL policies. Theoretical analysis establishes the feasibility and provides bounds for our attack method. Experiments on multiple Safe RL benchmarks demonstrate the effectiveness of our approach.
Primary Area: reinforcement learning
Submission Number: 11985
Loading