When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations

When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations

ICLR 2026 Conference Submission140 Authors

01 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Evolving Knowledge Injection; Large multimodal model; Benchmark and Dataset

TL;DR: This work introduces MMEVOKE benchmark to reveal challenges in knowledge injection and explores potential solutions.

Abstract: Large Multimodal Models (LMMs) store vast amounts of pretrained knowledge but struggle to remain aligned with real-world updates, making it difficult to avoid capability degradation when acquiring evolving knowledge. Furthermore, most current work focuses on exploring static textual knowledge injection, neglecting dynamic multimodal evolving knowledge injection, leaving the potential of LMMs for multimodal knowledge injection as an open question. To address this, we first propose a pipeline to construct MMEVOKE, a benchmark for evaluating LMMs' ability in multimodal evolving knowledge injection. MMEVOKE contains 9,422 samples spanning 159 subtypes. Then, based on extensive experiments with MMEVOKE, we reveal challenges such as poor injection performance and capability degradation in existing knowledge injection methods through knowledge injection tests and general capability tests. Finally, to tackle these challenges, we introduce knowledge augmentation and knowledge retention methods, finding that knowledge-aware augmentation strengthens knowledge injection performance, and that Data Replay and MoE methods effectively mitigate capability degradation.

Supplementary Material: pdf

Primary Area: datasets and benchmarks

Submission Number: 140

Loading