MPT: Multimodal Prompt Tuning for Event DetectionDownload PDF

Anonymous

17 Feb 2023 (modified: 05 May 2023)ACL ARR 2023 February Blind SubmissionReaders: Everyone
Abstract: Event Detection is a key and challenging sub-task of event extraction, which has serious trigger word ambiguity. Existing studies mainly focus on contextual information in text, while there are naturally many images in news articles that need to be explored. We believe that images not only reflect the core events of the text but also help to trigger word disambiguation. In this paper, we propose a new bi-recursive multimodal Prompt Tuning (MPT) model for deep interaction between images and sentences to achieve aggregation of modal features. MPT uses pre-trained CLIP to encode and map sentences and images into the same multimodal semantic space and uses alternating dual attention to select information features for mutual enhancement. Then, a soft prompt method of multimodal guidance is proposed, and the multimodal information obtained by fusion is used to guide the downstream event detection task. Our superior performance compared to six state-of-the-art baselines and further ablation studies, demonstrate the importance of image modality and the effectiveness of the proposed architecture.
Paper Type: long
Research Area: Information Extraction
0 Replies

Loading