Tree-based Action-Manipulation Attack Against Continuous Reinforcement Learning with Provably Efficient Support

Zhi Luo; Xiyuan Yang; Lixing Chen; Pan Zhou; Renfu Li

Tree-based Action-Manipulation Attack Against Continuous Reinforcement Learning with Provably Efficient Support

Zhi Luo, Xiyuan Yang, Lixing Chen, Pan Zhou, Renfu Li

22 Sept 2023 (modified: 20 Apr 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: reinforcement learning security, adversarial, provably efficient

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: Paper inclined towards theoretical proofs

Abstract: Due to the widespread application of reinforcement learning, research on its adversarial attacks is necessary for building secure reinforcement learning applications. However, most of the current security research focuses only on reinforcement learning with discrete states and actions, and these methods cannot be directly applied to reinforcement learning in continuous state and action spaces. In this paper, we investigate attacks on continuous reinforcement learning. Rather than manipulating observations or environments, our focus lies in action-manipulation attacks that impose more restrictions on the attacker. Our study investigates the action-manipulation attack in both white-box and black-box scenarios. We propose a black-box attack method called LCBT, which uses a layered binary tree structure-based refinement and segmentation method to handle continuous action spaces. Additionally, we prove that under the condition of a sublinear relationship between the dynamic regret and total step counts of the reinforcement learning agent, LCBT can force the agent to frequently take actions according to specified policies with only sublinear attack cost. We conduct experiments to evaluate the effectiveness of the LCBT attack on three widely-used reinforcement learning algorithms: DDPG, PPO, and TD3.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5306

Loading