MetaPoison: Learning to craft adversarial poisoning examples via meta-learning

Sep 25, 2019 ICLR 2020 Conference Withdrawn Submission readers: everyone
  • TL;DR: Generate corrupted training images that are imperceptible yet change CNN behavior on a target during any new training.
  • Abstract: We consider a new class of \emph{data poisoning} attacks on neural networks, in which the attacker takes control of a model by making small perturbations to a subset of its training data. We formulate the task of finding poisons as a bi-level optimization problem, which can be solved using methods borrowed from the meta-learning community. Unlike previous poisoning strategies, the meta-poisoning can poison networks that are trained from scratch using an initialization unknown to the attacker and transfer across hyperparameters. Further we show that our attacks are more versatile: they can cause misclassification of the target image into an arbitrarily chosen class. Our results show above 50% attack success rate when poisoning just 3-10% of the training dataset.
  • Code: https://github.com/2350532677/metapoison
  • Keywords: Adversarial Examples, Poisoning, Backdoor Attacks, Deep Learning
0 Replies

Loading