M2Edit: Locate and Edit Multi-Granularity Knowledge in Multimodal Large Language Model

M2Edit: Locate and Edit Multi-Granularity Knowledge in Multimodal Large Language Model

ACL ARR 2025 February Submission7993 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Multimodal knowledge editing is an important method for modifying outdated or incorrect knowledge in Multimodal Large Language Models (MLLMs). However, existing datasets for multimodal knowledge editing lack multi-granularity knowledge. In this paper, we present a more realistic dataset called M2Edit, which includes three distinct types of knowledge: entity, relation, and action. Additionally, existing knowledge editing methods for MLLMs lack the ability to handle multi-granularity knowledge and generalize to multimodal data. To address these limitations, we propose the multimodal knowledge editing method MLE. This approach identifies key knowledge layers within different components and collaboratively edits the various components of MLLMs. As a result, we observe significant improvements in visual generality performance, ranging from 4.8 to 10.8, and achieve the best overall performance on knowledge data of different granularities.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Multimodal knowledge editing; Multi-Granularity Knowledge; M2Edit; Multimodal Large Language Model;

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 7993

Loading