Adversarial Robust Deep Reinforcement Learning is Neither Robust Nor Safe

Ezgi Korkmaz

Adversarial Robust Deep Reinforcement Learning is Neither Robust Nor Safe

Ezgi Korkmaz

Published: 10 Oct 2024, Last Modified: 04 Dec 2024NeurIPS 2024 Workshop RBFM PosterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: deep reinforcement learning, robustness, adversarial, safety, generalization, alignment, robust reinforcement learning

Abstract: The policies trained with deep reinforcement learning are being deployed in many different settings from automated language assistants to biomedical applications. Yet concerns have been raised regarding robustness and safety of deep reinforcement learning policies. To target these problems several works focused on proposing adversarial training methods for deep reinforcement learning and claimed adversarial training achieves safe and robust deep reinforcement learning policies. In this paper, we demonstrate that adversarial deep reinforcement learning is neither safe nor is it robust. While robust deep reinforcement learning policies can be attacked via black-box adversarial perturbations, our results further demonstrate that standard reinforcement learning policies are more robust compared to robust deep reinforcement learning under natural attacks. Furthermore, this paper highlights that robust deep reinforcement learning policies cannot generalize even in the same level with standard reinforcement learning.

Submission Number: 36

Loading