TL;DR: Generate corrupted training images that are imperceptible yet change CNN behavior on a target during any new training.
Abstract: We consider a new class of \emph{data poisoning} attacks on neural networks, in which the attacker takes control of a model by making small perturbations to a subset of its training data. We formulate the task of finding poisons as a bi-level optimization problem, which can be solved using methods borrowed from the meta-learning community. Unlike previous poisoning strategies, the meta-poisoning can poison networks that are trained from scratch using an initialization unknown to the attacker and transfer across hyperparameters. Further we show that our attacks are more versatile: they can cause misclassification of the target image into an arbitrarily chosen class. Our results show above 50% attack success rate when poisoning just 3-10% of the training dataset.
Code: https://github.com/2350532677/metapoison
Keywords: Adversarial Examples, Poisoning, Backdoor Attacks, Deep Learning
Original Pdf: pdf
7 Replies
Loading