Programmable Neural Network Trojan for Pre-trained Feature Extractor

Yu Ji; Zinxin Liu; Xing Hu; Peiqi Wang; Youhui Zhang

Programmable Neural Network Trojan for Pre-trained Feature Extractor

Yu Ji, Zinxin Liu, Xing Hu, Peiqi Wang, Youhui Zhang

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Abstract: Neural network (NN) trojaning attack is an emerging and important attack that can broadly damage the system deployed with NN models. Different from adversarial attack, it hides malicious functionality in the weight parameters of NN models. Existing studies have explored NN trojaning attacks in some small datasets for specific domains, with limited numbers of fixed target classes. In this paper, we propose a more powerful trojaning attack method for large models, which outperforms existing studies in capability, generality, and stealthiness. First, the attack is programmable that the malicious misclassification target is not fixed and can be generated on demand even after the victim's deployment. Second, our trojaning attack is not limited in a small domain; one trojaned model on a large-scale dataset can affect applications of different domains that reuses its general features. Third, our trojan shows no biased behavior for different target classes, which makes it more difficult to defend.

Keywords: Neural Network, Trojan, Security

TL;DR: We present a more powerful NN trojaning attack that can support outer-scope targets and dynamic targets

Original Pdf: pdf

7 Replies

Loading