Characterizing Attacks on Deep Reinforcement Learning

Sep 27, 2018 ICLR 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Deep Reinforcement learning (DRL) has achieved great success in various applications, such as playing computer games and controlling robotic manipulation. However, recent studies show that machine learning models are vulnerable to adversarial examples, which are carefully crafted instances that aim to mislead learning models to make arbitrarily incorrect prediction, and raised severe security concerns. DRL has been attacked by adding perturbation to each observed frame. However, such observation based attacks are not quite realistic considering that it would be hard for adversaries to directly manipulate pixel values in practice. Therefore, we propose to understand the vulnerabilities of DRL from various perspectives and provide a throughout taxonomy of adversarial perturbation against DRL, and we conduct the first experiments on unexplored parts of this taxonomy. In addition to current observation based attacks against DRL, we propose attacks based on the actions and environment dynamics. Among these experiments, we introduce a novel sequence-based attack to attack a sequence of frames for real-time scenarios such as autonomous driving, and the first targeted attack that perturbs environment dynamics to let the agent fail in a specific way. We show empirically that our sequence-based attack can generate effective perturbations in a blackbox setting in real time with a small number of queries, independent of episode length. We conduct extensive experiments to compare the effectiveness of different attacks with several baseline attack methods in several game playing, robotics control, and autonomous driving environments.
0 Replies