Is Adversarial Training Really a Silver Bullet for Mitigating Data Poisoning?

Rui Wen; Zhengyu Zhao; Zhuoran Liu; Michael Backes; Tianhao Wang; Yang Zhang

Is Adversarial Training Really a Silver Bullet for Mitigating Data Poisoning?

Rui Wen, Zhengyu Zhao, Zhuoran Liu, Michael Backes, Tianhao Wang, Yang Zhang

Published: 01 Feb 2023, Last Modified: 02 Mar 2023ICLR 2023 notable top 25%Readers: Everyone

Keywords: Data poisoning, adversarial training, indiscriminative features, adaptive defenses, robust vs. non-robust features

Abstract: Indiscriminate data poisoning can decrease the clean test accuracy of a deep learning model by slightly perturbing its training samples. There is a consensus that such poisons can hardly harm adversarially-trained (AT) models when the adversarial training budget is no less than the poison budget, i.e., $\epsilon_\mathrm{adv}\geq\epsilon_\mathrm{poi}$. This consensus, however, is challenged in this paper based on our new attack strategy that induces \textit{entangled features} (EntF). The existence of entangled features makes the poisoned data become less useful for training a model, no matter if AT is applied or not. We demonstrate that for attacking a CIFAR-10 AT model under a reasonable setting with $\epsilon_\mathrm{adv}=\epsilon_\mathrm{poi}=8/255$, our EntF yields an accuracy drop of $13.31\%$, which is $7\times$ better than existing methods and equal to discarding $83\%$ training data. We further show the generalizability of EntF to more challenging settings, e.g., higher AT budgets, partial poisoning, unseen model architectures, and stronger (ensemble or adaptive) defenses. We finally provide new insights into the distinct roles of non-robust vs. robust features in poisoning standard vs. AT models and demonstrate the possibility of using a hybrid attack to poison standard and AT models simultaneously. Our code is available at~\url{https://github.com/WenRuiUSTC/EntF}.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)

TL;DR: We propose an indiscriminative feature-based poisoning approach to substantially degrade adversarial training, which was previously considered to be impossible.

25 Replies

Loading