Scalable Backdoor Detection in Neural Networks

Haripriya Harikumar, Vuong Le, Santu Rana, Sunil Gupta, Svetha Venkatesh

09 Nov 2022 (modified: 18 Jul 2025)OpenReview Archive Direct UploadReaders: Everyone

Abstract: Recently, it has been shown that deep learning models are vulnerable to Trojan attacks. In the Trojan attacks, an attacker can install a backdoor during training to make the model misidentify samples contaminated with a small trigger patch. Current backdoor detection methods fail to achieve good detection performance and are computationally expensive. In this paper, we propose a novel trigger reverse-engineering based approach whose computational complexity does not scale up with the number of labels and is based on a measure that is both interpretable and universal across different networks and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojan models from pure models, which is an improvement over the current state-of-the-art method.

0 Replies