Distilled Agent DQN for Provable Adversarial Robustness

Matthew Mirman; Marc Fischer; Martin Vechev

Distilled Agent DQN for Provable Adversarial Robustness

Matthew Mirman, Marc Fischer, Martin Vechev

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: As deep neural networks have become the state of the art for solving complex reinforcement learning tasks, susceptibility to perceptual adversarial examples have become a concern. The transferability of adversarial examples is known to enable attacks capable of tricking the agent into bad states. In this work we demonstrate a simple poisoning attack able to keep deep RL from learning, and into fooling it when trained with defense methods commonly used for classification tasks. We then propose an algorithm called DadQN, based on deep Q-networks, which enables the use of stronger defenses, including defenses enabling the first ever on-line robustness certification of a deep RL agent.

Keywords: reinforcement learning, dqn, adversarial examples, robustness analysis, adversarial defense, robust learning, robust rl

TL;DR: We introduce a way of (provably) defending Deep-RL against adversarial perturbations, including a new poisoning attack.

9 Replies

Loading