Supplementary Material: pdf
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: backdoor defense, backdoor learning, trusthworty AI, AI security
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a novel method to defend against backdoor attacks. Method filters and relabels poisoned samples using generative modelling.
Abstract: Data-poisoning attacks
change a small portion
of the training dataset
by introducing hand-crafted triggers
and rewiring the corresponding labels
towards a desired target class.
Training on such data injects
a backdoor into the model,
that causes incorrect inference
in selected test examples.
Existing defenses mitigate
the risks of such attacks
through various modifications
of the standard discriminative learning procedure.
This paper explores a different approach
that promises clean models
by means of per-class generative modelling.
We start by mapping the input data
into a suitable latent space
by leveraging a pre-trained
self-supervised feature extractor.
Interestingly, these representations
get either preserved or heavily disturbed
under recent backdoor attacks.
In both cases, we find that
per-class generative models
give rise to probabilistic densities
that allow both to detect the poisoned data
and to find their original classes.
This allows to patch the poisoned dataset
by reverting the original labels
and considering the triggers
as a kind of augmentation.
Our experiments show that
training on patched datasets
greatly reduces attack success rate
and retains the clean accuracy.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9081
Loading