Backdoor or Feature? A New Perspective on Data Poisoning

Alaa Khaddaj; Guillaume Leclerc; Aleksandar Makelov; Kristian Georgiev; Andrew Ilyas; Hadi Salman; Aleksander Madry

Backdoor or Feature? A New Perspective on Data Poisoning

Alaa Khaddaj, Guillaume Leclerc, Aleksandar Makelov, Kristian Georgiev, Andrew Ilyas, Hadi Salman, Aleksander Madry

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

TL;DR: A new theoretical foundation of data poisoning, with a theory inspired defense algorithm

Abstract: In a backdoor attack, an adversary adds maliciously constructed ("backdoor") examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks---that is, finding and removing the backdoor examples---typically involves viewing these examples as outliers and using techniques from robust statistics to detect and remove them. In this work, we present a new perspective on backdoor attacks. We argue that without structural information on the training data distribution, backdoor attacks are indistinguishable from naturally-occuring features in the data (and thus impossible to ``detect'' in a general sense). To circumvent this impossibility, we assume that a backdoor attack corresponds to the strongest feature in the training data. Under this assumption---which we make formal---we develop a new framework for detecting backdoor attacks. Our framework naturally gives rise to a corresponding algorithm whose efficacy we show both theoretically and experimentally.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

12 Replies

Loading