Value-Based Membership Inference Attack on Actor-Critic Reinforcement Learning

Yunhao Yang; ufuk topcu

Value-Based Membership Inference Attack on Actor-Critic Reinforcement Learning

Yunhao Yang, ufuk topcu

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Privacy, Membership Inference Attack, Value Function, Actor-Critic, Reinforcement Learning

Abstract: In actor-critic reinforcement learning (RL), the so-called actor and critic, respectively, compute candidate policies and a value function that evaluates the candidate policies. Such RL algorithms may be vulnerable to membership inference attacks (MIAs), a privacy attack that infers the data membership, i.e., whether a specific data record belongs to the training dataset. We investigate the vulnerability of value function in actor-critic to MIAs. We develop \textit{CriticAttack}, a new MIA that targets black-box RL agents by examining the correlation between the expected reward and the value function. We empirically show that \textit{CriticAttack} can correctly infer approximately 90\% of the training data membership, i.e., it achieves 90\% attack accuracy. Such accuracy is far beyond the 50\% random guessing accuracy, indicating a severe privacy vulnerability of the value function. To defend against \textit{CriticAttack}, we design a method called \textit{CriticDefense} that inserts uniform noise to the value function. \textit{CriticDefense} can reduce the attack accuracy to 60\% without significantly affecting the agent’s performance.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)

TL;DR: We introduce a new membership inference attack focusing on the value function of the actor-critic algorithm.

Supplementary Material: zip

8 Replies

Loading