Backdoor Attacks Against Transformers with Attention EnhancementDownload PDF

Published: 04 Mar 2023, Last Modified: 27 Apr 2023ICLR 2023 BANDS OralReaders: Everyone
Keywords: natural language processing, Deep Learning
TL;DR: We propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention pattern.
Abstract: With the popularity of transformers in natural language processing (NLP) applications, there are growing concerns about their security. Most existing NLP attack methods focus on injecting stealthy trigger words/phrases. In this paper, we focus on the interior structure of neural networks and the Trojan mechanism. Focusing on the prominent NLP transformer models, we propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention pattern. TAL significantly improves the attack efficacy; it achieves better successful rates and uses a much smaller poisoning rate (i.e., a smaller proportion of poisoned samples). It boosts attack efficacy for not only traditional dirty-label attacks, but also the more challenging clean-label attacks. TAL is compatible with existing attack methods and can be easily adapted to different backbone transformer models.
0 Replies
