XPoison: Cross-Class Attacks through Clean-Label Data Poisoning in Fine-Tuning

Jin Gan; Xin Li; Jun Luo

XPoison: Cross-Class Attacks through Clean-Label Data Poisoning in Fine-Tuning

Jin Gan, Xin Li, Jun Luo

20 Sept 2025 (modified: 20 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data poisoning, finetuning, cross-class, clean target data present, restricted data access, gradient-matching

Abstract: As deep learning relies on huge datasets for training, poisoning attacks that pollute the datasets pose a significant threat to it security. Given more models pretrained on private corpora inaccessible to external parties, earlier attacks demanding access to the base training datasets have their impact largely diminished, while practical threats focus on the finetuning stage when attackers can accurately target specific (intended) classes by manipulating a small subset of the dataset under their control. Fortunately, attackers could potentially be exposed also thanks to the substantially lowered data volume: e.g., correlation between identities and provided data classes poses risks to attackers. To enable stealthy poisoning, we introduce XPoison that strategically performs poisoning in a cross-class manner. Instead of directly poisoning the intended classes, a XPoison attacker only needs to provide dataset for unintended classes and hence hides its identity. We first propose a magnitude matching strategy to more efficiently align the malicious gradients. Furthermore, we estimate contradiction from clean target data and compensate gradient-wise, thereby counteracting its neutralizing influence on the poisoning effect. Through extensive evaluations, we demonstrate that XPoison is capable of robustly reducing the recognition accuracy of targeted classes by up to 38.37% during finetuning, while preserving high accuracy in poison classes.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 25143

Loading