Detecting Backdoor Attacks via Layer-wise Feature Analysis

Najeeb Moharram Jebreel; Yiming Li; Josep Domingo-Ferrer; Shu-Tao Xia

Detecting Backdoor Attacks via Layer-wise Feature Analysis

Najeeb Moharram Jebreel, Yiming Li, Josep Domingo-Ferrer, Shu-Tao Xia

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Backdoor Detection, Backdoor Defense, Backdoor Learning, Trustworthy ML, AI Security

TL;DR: We find out that the feature difference between benign and poisoned samples tends to reach the maximum at a critical layer, based on which we propose a simple yet effective method to filter poisoned samples by analyzing the features at that layer.

Abstract: Training well-performing deep neural networks (DNNs) usually requires massive training data and computational resources, which might not be affordable for some users. For this reason, users may prefer to outsource their training process to a third party or directly exploit publicly available pre-trained models. Unfortunately, doing so opens the possibility of a new dangerous training-time attack (dubbed backdoor attack) against DNNs. Currently, most of the existing backdoor detectors filter poisoned samples based on the latent feature representations generated by convolutional layers. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to reach the maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. In particular, we can locate this critical layer easily based on the behaviors of benign samples. Based on this finding, we propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. We conduct extensive experiments on two benchmark datasets, which confirm the effectiveness of our backdoor detection.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

4 Replies

Loading