Stealthy Fine-Grained Editing Attack on MLLMs

Stealthy Fine-Grained Editing Attack on MLLMs

ACL ARR 2026 January Submission9164 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Knowledge Editing, Editing Attack, Multimodal Language Models

Abstract: Multi-modal Large Language Models (MLLMs) store vast amounts of factual knowledge, enabling complex reasoning and generative tasks. However, their knowledge is typically static, raising the question of how to intervene in a model’s knowledge in a targeted manner without compromising overall behavior. In this work, we propose a novel Stealthy Fine-Grained Editing Attack (SA) that subtly modifies multiple knowledge triples within a single image. To support research in this area, we construct the first benchmark for SA, where each image contains multiple factual triples and adversarial edits focus on specific keywords, enabling precise control. We also design six comprehensive evaluation metrics—including Intra-Preservation, Inter-Preservation, Reliability, and Generality. Experiments on mainstream models, such as MiniGPT-4 and Qwen-VL-2.5-3B, reveal that attacks can selectively degrade specific knowledge while leaving other facts intact, that editing is sensitive to visual and semantic cues, and that even state-of-the-art models exhibit significant limitations. Our benchmark and metrics provide a standardized framework for studying fine-grained adversarial knowledge manipulation in multimodal models. Code is available at https://anonymous.4open.science/r/SFG-Attack-CF19/.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: benchmarking, multimodal QA

Contribution Types: Approaches to low-resource settings, Data resources, Data analysis

Languages Studied: Python

Submission Number: 9164

Loading