Adversarial Robustness Poisoning: Increasing Adversarial Vulnerability of the Model via Data Poisoning

Published: 01 Jan 2024, Last Modified: 20 May 2025GLOBECOM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep neural networks (DNNs) have become prevalent across various domains. However, recent research has revealed their vulnerability to data poisoning attacks, where adversaries inject poisoned data to compromise the usability of the target model. Traditional data poisoning attacks focus on reducing the test accuracy of the model, but they can be detected by model performance evaluation or mitigated by data cleaning. In contrast, we propose an Adversarial Robustness Poisoning Scheme (ARPS) that aims to decrease the adversarial robustness while preserving the normal-functionality of the target model. To achieve ARPS, we first separate the features of data into robust and non-robust features, where the non-robust features are human-imperceptible and more sensitive to adversarial perturbations. After that, we construct a dataset containing only non-robust features, which serves as the poisoning data. For the malicious dataset provider, the poisoned dataset can be constructed by adding poisoning data to the original dataset. For the malicious model provider, we employ the uncertainty-weighted multi-task learning technique to train the poisoned model, facilitating a better balance between functionality-preserving (good accuracy) and attack effectiveness (bad robustness). Extensive experiments are carried out to illustrate the effectiveness of ARPS in weakening adversarial training and amplifying adversarial attacks, as well as the stealthiness of ARPS in escaping the defense of data cleaning and model fine-tuning. Additionally, we propose some potential countermeasures against ARPS, including regularization and data augmentations.
Loading