Abstract: Backdoor attacks typically set up a backdoor by contaminating training data or modifying parameters before the model is deployed, such that a predetermined trigger can activate harmful effects during the test phase. Can we, however, carry out test-time backdoor attacks after deploying the model? In this work, we present **AnyDoor**, a test-time backdoor attack against multimodal large language models (MLLMs), without accessing training data or modifying parameters. In AnyDoor, the burden of **setting up** backdoors is assigned to the visual modality (better capacity but worse timeliness), while the textual modality is responsible for **activating** the backdoors (better timeliness but worse capacity). This decomposition takes advantage of the characteristics of different modalities, making attacking timing more controllable compared to directly applying adversarial attacks. We empirically validate the effectiveness of AnyDoor against popular MLLMs such as LLaVA-1.5, MiniGPT-4, InstructBLIP, and BLIP-2, and conduct extensive ablation studies. Notably, AnyDoor can dynamically change its backdoor trigger prompts and/or harmful effects, posing a new challenge for developing backdoor defenses.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Daphne_Ippolito1
Submission Number: 5755
Loading