Transform-Enabled Detection of Backdoor Attacks in Deep Neural Networks

TMLR Paper4008 Authors

18 Jan 2025 (modified: 08 Mar 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep Neural Networks (DNNs) have been widely deployed in a range of safety-critical applications. Recent work has illustrated their vulnerability to malicious backdoor attacks, which lead to DNN malfunction when a specific backdoor trigger is applied to the DNN input image. These backdoors cause uncharacteristic behavior in DNN hidden layers, causing the DNN to misclassify the input image. In this work we present Transform-Enabled Detection of Attacks (TESDA), a novel algorithm for on-line detection of uncharacteristic behavior in DNN hidden layers indicative of a backdoor. We leverage the training-dataset distributions of reduced-dimension transforms of deep features in a backdoored DNN to rapidly detect malicious behavior, using theoretically grounded methods with bounded false alarm rates. We verify that TESDA is able to achieve state-of-the-art detection with very low latency on a variety of attacks, datasets and network backbones. Further ablations show that only a small proportion of DNN training data is needed for TESDA to fit an attack detector to the backdoored network.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Chao_Chen1
Submission Number: 4008
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview