Keywords: Multimodal Large Language Models, Retrieval Augmented Generation, LLM security, Knowledge Poisoning Attack
TL;DR: This paper introduces MRAG-Corrupter, the first knowledge poisoning attack on multimodal RAG systems.
Abstract: Multimodal retrieval-augmented generation (RAG) enhances visual reasoning in vision-language models (VLMs) by accessing external knowledge bases. However, their security vulnerabilities remain largely unexplored. In this work, we introduce MRAG-Corrupter, the first knowledge poisoning attack on multimodal RAG systems. MRAG-Corrupter injects a few crafted image-text pairs into the knowledge database, manipulating VLMs to generate attacker-desired responses. We formalize the attack as an optimization problem and propose two cross-modal strategies, dirty-label and clean-label, based on the attacker’s knowledge and goals. Our experiments across multiple knowledge databases and VLMs show that MRAG-Corrupter outperforms existing methods, achieving up to a 98% attack success rate with only five malicious pairs injected into the InfoSeek database (481,782 pairs). We also evaluate four defense strategies—paraphrasing, duplicate removal, structure-driven mitigation, and purification—revealing their limited effects against MRAG-Corrupter. Our results highlight the effectiveness of MRAG-Corrupter, underscoring its threat to multimodal RAG systems.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 12763
Loading