Strong Baseline Defenses Against Clean-Label Poisoning Attacks

Neal Gupta; W. Ronny Huang; Liam Fowl; Chen Zhu; Soheil Feizi; Tom Goldstein; John Dickerson

Strong Baseline Defenses Against Clean-Label Poisoning Attacks

Neal Gupta, W. Ronny Huang, Liam Fowl, Chen Zhu, Soheil Feizi, Tom Goldstein, John Dickerson

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone

TL;DR: We present effective defenses to clean-label poisoning attacks.

Abstract: Targeted clean-label poisoning is a type of adversarial attack on machine learning systems where the adversary injects a few correctly-labeled, minimally-perturbed samples into the training data thus causing the deployed model to misclassify a particular test sample during inference. Although defenses have been proposed for general poisoning attacks (those which aim to reduce overall test accuracy), no reliable defense for clean-label attacks has been demonstrated, despite the attacks' effectiveness and their realistic use cases. In this work, we propose a set of simple, yet highly-effective defenses against these attacks. We test our proposed approach against two recently published clean-label poisoning attacks, both of which use the CIFAR-10 dataset. After reproducing their experiments, we demonstrate that our defenses are able to detect over 99% of poisoning examples in both attacks and remove them without any compromise on model performance. Our simple defenses show that current clean-label poisoning attack strategies can be annulled, and serve as strong but simple-to-implement baseline defense for which to test future clean-label poisoning attacks.

Keywords: poisoning, defenses, robustness, adversarial, ML security, ML safety

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/strong-baseline-defenses-against-clean-label/code)

Original Pdf: pdf

7 Replies

Loading