Keywords: Backdoor Detection, Backdoor Defense, Backdoor Learning, Trustworthy ML, AI Security
TL;DR: We find out that the feature difference between benign and poisoned samples tends to reach the maximum at a critical layer, based on which we propose a simple yet effective method to filter poisoned samples by analyzing the features at that layer.
Abstract: Training well-performing deep neural networks (DNNs) usually requires massive training data and computational resources, which might not be affordable for some users. For this reason, users may prefer to outsource their training process to a third party or directly exploit publicly available pre-trained models. Unfortunately, doing so opens the possibility of a new dangerous training-time attack (dubbed backdoor attack) against DNNs. Currently, most of the existing backdoor detectors filter poisoned samples based on the latent feature representations generated by convolutional layers. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to reach the maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. In particular, we can locate
this critical layer easily based on the behaviors of benign samples. Based on this finding, we propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. We conduct extensive experiments on two benchmark datasets, which confirm the effectiveness of our backdoor detection.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
4 Replies
Loading