Integrating Large Language Models in Multimodal Entity Linking: A Novel Two-Level Reflection FrameworkDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: Multimodal Entity Linking (MEL) is an essential technology in numerous applications. Existing methods depend on designing complex multimodal interaction modules and require extensive domain-specific training data. As the traditional pretrain-finetune paradigm evolves towards prompt engineering with large language models (LLMs), investigating prompt engineering-based MEL approaches becomes increasingly vital. However, using LLMs with straightforward instructions presents challenges in MEL tasks. These include context-unfaithful fine-grained entity selection and the overlooking of key details due to information overload. To this end, this paper introduces a novel two-level reflection framework for MEL tasks, named SMCR. In this framework, an LLM is used for entity selection. To address context-unfaithfulness, we implement semantic consistency reflection based on LLM’s self-feedback. To simplify the complexity of image utilization and alleviate information overload, we introduce modality consistency reflection. This approach iteratively integrates visual clues through external feedback. Experimental results on two established public MEL datasets show that our solution achieves state-of-the-art performance. Further analysis confirms the effectiveness of our proposed modules. Our code is available at https://anonymous.4open.science/r/SMCR-1215.
Paper Type: long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: NLP engineering experiment
Languages Studied: English
0 Replies

Loading