Stealthy Textual Backdoor Attacks via Contrastive Decoding

Stealthy Textual Backdoor Attacks via Contrastive Decoding

ACL ARR 2026 January Submission7061 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Artificial intelligence security, backdoor attacks, pre-trained language model

Abstract: With the widespread adoption of large language models (LLMs), exploring potential attack mechanisms has become crucial for understanding their security risks. Among these, backdoor attacks play an important role. Recently, instead of inserting rare phrases, more works are conducted by paraphrasing into specific styles using paraphrase models. Though effective, this strategy still struggles to generate consistent styles for constructing reliable triggers due to the inherent generative bias of the paraphrase model. To mitigate this problem, we propose incorporating contrastive decoding strategy, and designing a novel Contrastive Decoding-based Attack (CDAttack) for backdoor attacks. Specifically, CDAttack first employs two complementary paraphrasing style prompts (i.e., expert-style and amateur-style) to generate expert-style text and extract potential model generation biases, respectively. Then, CDAttack designs a contrastive constraint to eliminate model-generated bias while amplifying expert-style features. Along this line, CDAttack encourages the paraphrase model to generate consistent expert-style text, achieving more reliable backdoor attacks. Extensive experiments over several advanced pre-trained language models across three different tasks demonstrate the effectiveness of CDAttack (e.g., achieving over 21% higher attack success rates compared to the advanced BGMAttack when using fewer poisoned samples). We also release the code at \url{https://anonymous.4open.science/r/CDAttack}.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: security/privacy

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 7061

Loading