Representation Confusion: Towards Representation Backdoor on CLIP via Concept Activation

Lijie Hu; Junchi Liao; Weimin Lyu; Shaopeng Fu; Tianhao Huang; Shu Yang; Guimin Hu; Di Wang

Representation Confusion: Towards Representation Backdoor on CLIP via Concept Activation

Lijie Hu, Junchi Liao, Weimin Lyu, Shaopeng Fu, Tianhao Huang, Shu Yang, Guimin Hu, Di Wang

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: concept, backdoor, CLIP

Abstract: Backdoor attacks pose a significant threat to deep learning models, allowing attackers to stealthily embed hidden triggers that can be exploited during inference. Traditional backdoor attacks typically rely on inserting external patches or perturbations into input data as triggers. However, two key challenges remain, i.e., how to evade detection by defense mechanisms and reduce the computational cost of trigger insertion. To address these challenges and design more advanced backdoor techniques, we first explore the underlying mechanisms of backdoor attacks through the lens of cognitive neuroscience, drawing parallels between model decision-making and human cognitive processes. We conceptualize the decision process elicited by the backdoor-triggering as movement between representation spaces (i.e., learned concepts). Thus, existing methods can be seen as implicit manipulations of these stored concepts. This raises a key question: \textit{Why not manipulate the concept explicitly? Could the inherent concepts in the model's reasoning serve as an ``internal trigger'' for the backdoor?} Motivated by this, we propose a novel backdoor attack framework, namely Representation Confusion (RepConfAttack), which explicitly manipulates concepts within the model's representation spaces. This approach eliminates the need for backdoor triggers and enhances stealthness by making the attack harder to detect with traditional defenses. Experimental results demonstrate the effectiveness of our method, achieving high attack success rates even against robust defense mechanisms.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8165

Loading