MM-ChatAlign: A Novel Multimodal Reasoning Framework based on Large Language Models for Entity Alignment

ACL ARR 2024 June Submission3567 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal entity alignment (MMEA) integrates multi-source and cross-modal knowledge graphs, a crucial yet challenging task for data-centric applications. Traditional MMEA methods derive the visual embeddings of entities and combine them with other modal data for alignment by embedding similarity comparison. However, these methods are hampered by the limited comprehension of visual attributes and deficiencies in realizing and bridging the semantics of multimodal data. To address these challenges, we propose MM-ChatAlign, a novel framework that utilizes the visual reasoning abilities of MLLMs for MMEA. The framework features an embedding-based candidate collection module that adapts to various knowledge representation strategies, effectively filtering out irrelevant reasoning candidates. Additionally, a reasoning and rethinking module, powered by MLLMs, enhances alignment by efficiently utilizing multimodal information. Extensive experiments on four MMEA datasets demonstrate MM-ChatAlign's superiority and underscore the significant potential of MLLMs in MMEA tasks. The source code is available at https://anonymous.4open.science/r/MMEA/.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: multimodal entity alignment, large language models, cross-modal application
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Approaches low compute settings-efficiency, Data resources, Data analysis
Languages Studied: English
Submission Number: 3567
Loading