Programmable Neural Network Trojan for Pre-trained Feature ExtractorDownload PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Abstract: Neural network (NN) trojaning attack is an emerging and important attack that can broadly damage the system deployed with NN models. Different from adversarial attack, it hides malicious functionality in the weight parameters of NN models. Existing studies have explored NN trojaning attacks in some small datasets for specific domains, with limited numbers of fixed target classes. In this paper, we propose a more powerful trojaning attack method for large models, which outperforms existing studies in capability, generality, and stealthiness. First, the attack is programmable that the malicious misclassification target is not fixed and can be generated on demand even after the victim's deployment. Second, our trojaning attack is not limited in a small domain; one trojaned model on a large-scale dataset can affect applications of different domains that reuses its general features. Third, our trojan shows no biased behavior for different target classes, which makes it more difficult to defend.
  • Keywords: Neural Network, Trojan, Security
  • TL;DR: We present a more powerful NN trojaning attack that can support outer-scope targets and dynamic targets
7 Replies