Editing the Moving World: Model Editing for Video LLMs

Editing the Moving World: Model Editing for Video LLMs

ACL ARR 2026 January Submission5520 Authors

05 Jan 2026 (modified: 07 Jun 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model Editing, Knowledge Editing, Video-LLMs, Benchmarking

Abstract: Model Editing, also known as knowledge editing, is receiving increasing attention in the field of Large Language Models (LLMs). However, existing model editing approaches predominantly focus on knowledge-level or static visual domains, overlooking dynamic semantics. This paper exploratively applies six representative model editing methods (FT, IKE, MEND, SERAC, MEMIT and AlphaEdit) to Video Large Language Models (Vid-LLMs) and introduces the first benchmark specifically designed for Vid-LLMs editing—$\textbf{VMEB}$ ($\textbf{V}$id-LLMs $\textbf{M}$odel $\textbf{E}$diting $\textbf{B}$enchmark)—systematically extending model editing research from static modalities to dynamic video scenarios. In the video paradigm, our evaluation dimensions encompass traditional metrics including Reliability, Locality, and Generality, while also introducing a video-specific metric: Robustness. Based on experimental results, we analyze the strengths and limitations of existing model editing approaches, and identify new challenges and research directions for the future development of the model editing field within the context of multimodal and video paradigms.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Model Editing, Multimodality, Benchmarking

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Submission Number: 5520

Loading