Contrastive Learning and Feature Space Tactics: A Dual Approach to Strengthen Backdoor Attacks

Published: 2025, Last Modified: 04 May 2026IEEE Trans. Inf. Forensics Secur. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Backdoor attacks are a security threat to deep learning, where attackers inject malicious trigger features into the training data. This causes the model to behave normally during regular operations but produce predetermined incorrect outputs when specific trigger conditions are met. Current advanced text backdoor attack methods use grammar or text style as invisible backdoor trigger features. Although these methods are highly stealthy, their attack performance is poor, and they struggle to counter defenses based on fine-tuning strategies. In this paper, we propose a new multitask backdoor attack framework (CLaFS) for pretrained language models, which uses supervised contrastive learning and feature space isolation auxiliary tasks to increase textual backdoor attack performance. Supervised contrastive learning can enhance the ability of the auxiliary task to learn from poisoned samples, improving backdoor attack effectiveness through parameter sharing. The feature space isolation task enhances the sensitivity of the model to backdoor trigger features by separating poisoned data from other types of data in the feature space, reducing the model’s resistance to backdoor attacks. In addition, we propose a special attack method called Zero Poison Attack, which aims to indirectly achieve backdoor embedding without contaminating the training data of the target task. The experimental results show that our proposed methods significantly improve the performance of invisible textual backdoor attacks and perform well in various special attack scenarios, demonstrating good generalizability and robustness.
Loading