Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where entities can be associated with related images.
Most existing studies rely heavily on the automatically learned multi-modal fusion modules, which may allow redundant information such as misleading clues in the generated entity representations, impeding the feature consistency of equivalent entities.
To this end, we propose a variational framework for MMEA via information bottleneck, termed as IBMEA, by emphasizing alignment-relevant information while suppressing alignment-irrelevant information in entity representations.
Specifically, we first develop multi-modal variational encoders that represent modal-specific features as probability distributions.
Then, we propose four modal-specific information bottleneck regularizers to limit the misleading clues in the modal-specific entity representations.
Finally, we propose a modal-hybrid information contrastive regularizer to integrate modal-specific representations and ensure the similarity of equivalent entities between MMKGs to achieve MMEA.
We conduct extensive experiments on 2 cross-KG and 3 bilingual MMEA datasets.
Experimental results demonstrate that our model consistently outperforms previous state-of-the-art methods, and also shows promising and robust performance especially in the low-resource and high-noise data scenarios.
Primary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: This work focuses on an important task in multimodal research, namely multi-modal entity alignment (MMEA). Existing studies usually employ automatically learned multimodal fusion modules, which can hardly suppress redundant information in the fused representations, and may introduce misleading information in the task prediction. To this end, this paper proposes an innovative multimodal variational framework for this task, namely IBMEA. In particular, it exploits the variational information bottleneck to fuse information from different modalities, which aims to maximize the retention of alignment-relevant information and minimize the alignment-irrelevant information. Experimental results demonstrate the effectiveness of the proposed IBMEA. In general, our work provides a novel perspective for MMEA with IB, and to our knowledge, we are the first to explore IB to alleviate the misleading information in MMEA. Besides, we believe this work can benefit various multimodal knowledge graph applications and can be helpful to promote the development of multimedia and multimodal processing and attract wide interest in the multimodal research communities.
Supplementary Material: zip
Submission Number: 1955
Loading