Abstract: Cyber-Physical Systems(CPS) are the integration of sensing, control, computation, and networking with physical components and infrastructure connected by the internet. The autonomy and reliability are enhanced by the recent development of safe reinforcement learning (safe RL). However, the vulnerability of safe RL to adversarial conditions has received minimal exploration. In order to truly ensure safety in physical world applications, it is crucial to understand and address these potential safety weaknesses in learned control policies. In this work, we demonstrate a novel attack to violate safety that induces unsafe behaviors by adversarial models trained using reversed safety constraints. The experiment results show that the proposed method is more effective than existing works.
Loading