Stealthy Fine-Grained Editing Attack on MLLMs

ICLR 2026 Conference Submission17454 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge Editing, Editing Attack, Multimodal Language Models, Knowledge Graph
Abstract: Knowledge editing enables large models to update facts without costly retraining, but recent work shows it can be misused for adversarial injection. Prior studies mainly target large language models, leaving multimodal scenarios underexplored. We introduce the Stealthy Fine-Grained Editing Attack (SFG-Attack) and the Stealthiness Attack Dataset, designed for multimodal models. Unlike traditional datasets, ours provides professional and fine-grained data: unique entities with multiple knowledge facts per image, and attacks focused on specific keywords for precise control. We further propose a new metric, Stealthiness, measuring the impact on other knowledge within the same image. In addition, we redefine Reliability, Locality and Generality, introduce a new dimension of Robustness to assess model stability under perturbations. Together, these advances provide both data and methodology for strengthening the safety evaluation and defense of multimodal models. Code is available at https://anonymous.4open.science/r/SFG-Attack-CF19/.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 17454
Loading