Distilled Agent DQN for Provable Adversarial Robustness

Matthew Mirman, Marc Fischer, Martin Vechev

Sep 27, 2018 ICLR 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: As deep neural networks have become the state of the art for solving complex reinforcement learning tasks, susceptibility to perceptual adversarial examples have become a concern. The transferability of adversarial examples is known to enable attacks capable of tricking the agent into bad states. In this work we demonstrate a simple poisoning attack able to keep deep RL from learning, and into fooling it when trained with defense methods commonly used for classification tasks. We then propose an algorithm called DadQN, based on deep Q-networks, which enables the use of stronger defenses, including defenses enabling the first ever on-line robustness certification of a deep RL agent.
  • Keywords: reinforcement learning, dqn, adversarial examples, robustness analysis, adversarial defense, robust learning, robust rl
  • TL;DR: We introduce a way of (provably) defending Deep-RL against adversarial perturbations, including a new poisoning attack.
0 Replies