Automatic Fusion for Multimodal Entity Alignment: A New Perspective from Automatic Architecture Search

Chenyang Bu, Yunpeng Hong, Shiji Zang, Guojie Chang, Xindong Wu

Published: 01 Jan 2024, Last Modified: 15 Feb 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Integrating multimodal data from diverse sources is crucial for enhancing various applications. Multimodal entity alignment (MMEA), which discovers equivalent entities across different sources and modalities, aims to eliminate data silos for comprehensive integration. A key challenge in MMEA is effectively fusing vector representations from different modalities of the same entity for optimal entity matching. Existing fusion methods involve individual fusion operators (e.g., concatenation and summation) or the manual design of complex network structures, incurring significant human resource costs. In this paper, for the first time, we introduce the research question of automatic fusion for MMEA and propose an efficient approach from the perspective of automated architecture search. Experimental comparisons with state-of-the-art methods on real-world datasets demonstrate the effectiveness of the proposed approach.