Keywords: Knowledge Editing, Editing Attack, Multimodal Language Models
Abstract: Multi-modal Large Language Models (MLLMs) store vast amounts of factual knowledge, enabling complex reasoning and generative tasks. However, their knowledge is typically static, raising the question of how to intervene in a model’s knowledge in a targeted manner without compromising overall behavior. In this work, we propose a novel Stealthy Fine-Grained Editing Attack (SA) that subtly modifies multiple knowledge triples within a single image. To support research in this area, we construct the first benchmark for SA, where each image contains multiple factual triples and adversarial edits focus on specific keywords, enabling precise control. We also design six comprehensive evaluation metrics—including Intra-Preservation, Inter-Preservation, Reliability, and Generality. Experiments on mainstream models, such as MiniGPT-4 and Qwen-VL-2.5-3B, reveal that attacks can selectively degrade specific knowledge while leaving other facts intact, that editing is sensitive to visual and semantic cues, and that even state-of-the-art models exhibit significant limitations. Our benchmark and metrics provide a standardized framework for studying fine-grained adversarial knowledge manipulation in multimodal models. Code is available at https://anonymous.4open.science/r/SFG-Attack-CF19/.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: benchmarking, multimodal QA
Contribution Types: Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: Python
Submission Number: 9164
Loading