BBCaL: Black-box Backdoor Detection under the Causality Lens

Mengxuan Hu; Zihan Guan; Junfeng Guo; Zhongliang Zhou; Jielu Zhang; Sheng Li

BBCaL: Black-box Backdoor Detection under the Causality Lens

Mengxuan Hu, Zihan Guan, Junfeng Guo, Zhongliang Zhou, Jielu Zhang, Sheng Li

Published: 24 Dec 2024, Last Modified: 09 May 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Event Certifications: iclr.cc/ICLR/2025/Journal_Track

Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, where attackers can inject hidden backdoors during the training stage. This poses a serious threat to the Model-as-a-Service setting, where downstream users directly utilize third-party models (e.g., HuggingFace Hub, ChatGPT). To this end, we study the inference-stage black-box backdoor detection problem in the paper, where defenders aim to build a firewall to filter out the backdoor inputs in the inference stage, with only input samples and prediction labels available. Existing investigations on this problem either rely on strong assumptions on types of triggers and attacks or suffer from poor efficiency. To build a more generalized and efficient method, we first provide a novel causality-based lens to analyze heterogeneous prediction behaviors for clean and backdoored samples in the inference stage, considering both sample-specific and sample-agnostic backdoor attacks. Motivated by the causal analysis and do-calculus in causal inference, we introduce Black-box Backdoor detection under the Causality Lens (BBCaL) which distinguishes backdoor and clean samples by analyzing prediction consistency after progressively constructing counterfactual samples. Theoretical analysis also sheds light on the effectiveness of the BBCaL. Extensive experiments on three benchmark datasets validate the effectiveness and efficiency of our method.

Certifications: Featured Certification

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: - Add more discussions on related works. - Additional experiments that were conducted in the rebuttal period.

Supplementary Material: zip

Assigned Action Editor: ~Eleni_Triantafillou1

Submission Number: 3269

Loading