Multimedia Event Extraction with LLM Knowledge Editing

Multimedia Event Extraction with LLM Knowledge Editing

ACL ARR 2025 May Submission1880 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Multimodal event extraction task aims to identify event types and arguments from visual and textual representations related to events. Due to the high cost of multimedia training data, previous methods mainly focused on weakly alignment of excellent unimodal encoders. However, they ignore the conflict between event understanding and image recognition, resulting in redundant feature perception affecting the understanding of multimodal events. In this paper, we propose a multimodal event extraction strategy with a multi-level redundant feature selection mechanism, which enhances the event understanding ability of multimodal large language models by leveraging knowledge editing techniques, and requires no additional parameter optimization work. Extensive experiments show that our method outperforms the state-of-the-art (SOTA) baselines on the M2E2 benchmark. Compared with the highest baseline, we achieve a 34\% improvement of precision on event extraction and a 11\% improvement of F1 on argument extraction.

Paper Type: Long

Research Area: Information Extraction

Research Area Keywords: Information Extraction, Multimodality and LangMuage Grounding to Vision, Robotics and Beyond

Languages Studied: English

Submission Number: 1880

Loading