everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
Deep Neural Networks (DNNs) have shown remarkable performance in various downstream tasks. However, these models are vulnerable to backdoor attacks that are conducted by poisoning data for model training and misleading poisoned models to output target labels on predefined triggers. Such vulnerabilities make training DNNs on third-party datasets risky and raise significant concerns and studies for safety. With an unauthorized dataset, it is difficult to train a model on such data without the backdoored behavior on poison samples. In this paper, we first point out that training neural networks by forcing the dimension of the feature space will induce trigger misclassification while preserving natural data performance. Based on these observations, we propose a novel module called EigenGuard, naturally trained with such a module will make neural networks neglect triggers during training on the unauthorized datasets. Experiments show that, compared with previous works, models with our EigenGuard can show better performance on both backdoor and natural examples compared with other defense algorithms.