Keywords: Multimodal Large Language Models, Test-Time Backdoor Attacks
TL;DR: We propose test-time backdoor attacks against multimodal large language models, which involve injecting the backdoor into the textual modality via a universal image perturbation, without access to training data.
Abstract: Backdoor attacks typically set up a backdoor by contaminating training data or modifying parameters before the model is deployed, such that a predetermined trigger can activate harmful effects during the test phase. Can we, however, carry out test-time backdoor attacks *after* deploying the model? In this work, we present **AnyDoor**, a test-time backdoor attack against multimodal large language models (MLLMs), without accessing training data or modifying parameters. In AnyDoor, the burden of *setting up* backdoors is assigned to the visual modality (better capacity but worse timeliness), while the textual modality is responsible for *activating* the backdoors (better timeliness but worse capacity). This decomposition takes advantage of the characteristics of different modalities, making attacking timing more controllable compared to directly applying adversarial attacks. We empirically validate the effectiveness of AnyDoor against popular MLLMs such as LLaVA-1.5, MiniGPT-4, InstructBLIP, and BLIP-2, and conduct extensive ablation studies. Notably, AnyDoor can dynamically change its backdoor trigger prompts and/or harmful effects, posing a new challenge for developing backdoor defenses.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4806
Loading