Causality-Based Black-Box Backdoor Detection

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Backdoor Defense, Causal Inference, Third-party Models
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, where attackers can inject hidden backdoors during the training stage. These attacks pose a serious threat to downstream users who unintentionally use third-party backdoored models (e.g., HuggingFace, ChatGPT). To mitigate the backdoor attacks, various backdoor detection methods have been proposed, but most of them require additional access to the model's weights or validation sets, which are not always available for third-party models. In this paper, we adopt a recently proposed setting, which aims to build a firewall at the user end to identify the backdoor samples and reject them, where only samples and prediction labels are accessible. To address this challenge, we first provide a novel causality-based perspective for analyzing the heterogeneous prediction behaviors for backdoor and clean samples. Leveraging this established causal insight, we then propose a Causality-based Black-Box Backdoor Detection algorithm, which introduces counterfactual samples as an intervention to distinguish backdoor and clean samples. Extensive experiments on three benchmark datasets validate the effectiveness and efficiency of our method. Our code is available at https://anonymous.4open.science/r/CaBBD-4326/
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 832
Loading