Shap-CA: Shapley Value-based Contrastive Alignment for Multimodal Information ExtractionDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We propose a novel Shapley Value-based Contrastive Alignment for Multimodal Information Extraction.
Abstract: Recently, Multimodal Information Extraction (MIE) has attracted a lot of attention. Most of the existing methods focus on direct Image-Text interactions and face significant challenges due to the semantic and modality gaps between images and text. In this paper, we introduce a new paradigm of Image-Context-Text interaction, leveraging large multimodal models (LMMs) to generate descriptive textual context as a bridge to address these gaps. Following this paradigm, we propose a novel method, Shapley Value-based Contrastive Alignment (Shap-CA), which aligns both context-text and context-image pairs. Shap-CA first applies the Shapley value to measure the individual contribution of each element in context-text/context-image pairs to the overall semantic/modality overlaps, and then employs a contrastive learning strategy to maximize the contributions from relevant pairs and minimize those from irrelevant ones. Furthermore, we incorporate an adaptive fusion module for selective cross-modal fusion. Extensive experiments across four MIE datasets demonstrate that our method significantly outperforms existing state-of-the-art methods. Code will be released upon acceptance.
Paper Type: long
Research Area: Information Extraction
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
0 Replies

Loading