EigenGuard: Backdoor Defense in Eigenspace

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Backdoor Defense, netural network, spectrum
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Deep Neural Networks (DNNs) have shown remarkable performance in various downstream tasks. However, these models are vulnerable to backdoor attacks that are conducted by poisoning data for model training and misleading poisoned models to output target labels on predefined triggers. Such vulnerabilities make training DNNs on third-party datasets risky and raise significant concerns and studies for safety. With an unauthorized dataset, it is difficult to train a model on such data without the backdoored behavior on poison samples. In this paper, we first point out that training neural networks by forcing the dimension of the feature space will induce trigger misclassification while preserving natural data performance. Based on these observations, we propose a novel module called EigenGuard, naturally trained with such a module will make neural networks neglect triggers during training on the unauthorized datasets. Experiments show that, compared with previous works, models with our EigenGuard can show better performance on both backdoor and natural examples compared with other defense algorithms.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4413
Loading