Keywords: reinforcement learning, adversarial examples, attack
TL;DR: We propose a new attack for taking full control of neural policies in realistic settings.
Abstract: We propose a new perspective on adversarial attacks against deep reinforcement learning agents. Our main contribution is CopyCAT, a targeted attack able to consistently lure an agent into following an outsider's policy. It is pre-computed, therefore fast inferred, and could thus be usable in a real-time scenario. We show its effectiveness on Atari 2600 games in the novel read-only setting. In the latter, the adversary cannot directly modify the agent's state -its representation of the environment- but can only attack the agent's observation -its perception of the environment. Directly modifying the agent's state would require a write-access to the agent's inner workings and we argue that this assumption is too strong in realistic settings.
Original Pdf: pdf
8 Replies
Loading