A Unified Visual Prompt Tuning Framework with Mixture-of-Experts for Multimodal Information ExtractionOpen Website

Published: 01 Jan 2023, Last Modified: 18 Dec 2023DASFAA (3) 2023Readers: Everyone
Abstract: Recently, multimodal information extraction has gained increasing attention in social media understanding, as it helps to accomplish the task of information extraction by adding images as auxiliary information to solve the ambiguity problem caused by insufficient semantic information in short texts. Despite their success, current methods do not take full advantage of the information provided by the diverse representations of images. To address this problem, we propose a novel unified visual prompt tuning framework with Mixture-of-Experts to fuse different types of image representations for multimodal information extraction. Extensive experiments conducted on two different multimodal information extraction tasks demonstrate the effectiveness of our method. The source code can be found at https://github.com/xubodhu/VisualPT-MoE .
0 Replies

Loading