Poisoning-based Backdoor Attack against Vision-Language Model
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Security, Backdoor Attack, Data Poisoning, Visual Instruction Tuning, Large Language Models
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: We delve into a novel methodology of performing stealthy backdoor attacks on large language models under visual instruction tuning, wherein we subtly infuse malicious triggers in both the textual and visual domains, thereby manipulating the model to exhibit predefined exploitable behaviors during the inference process. Our methodology, in contrast to existing works, does not necessitate access to model gradients and weights and ensures the deployment of clean images and labels. This strategic poisoning maintains stealth by generating correct outputs when tested on samples with untriggered images or instructions while revealing its strength by activating malicious behaviors when encountered with specified visual triggers and textual triggers at the same time. We further examine the effectiveness of our method on LLaVA model, exploring various tasks including string injection, over-refusal, and false knowledge injection, to demonstrate the versatility and robustness of our approach in diverse application scenarios. The results expose significant vulnerabilities in current models, emphasizing the need for developing advanced countermeasures and underlining the greater importance of data quality in the realm of visual instruction tuning.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8995
Loading