Multi-modal Entity Alignment under Imbalanced Visual Modality Information

Xin Zhang, Yu Liu, Shimin Shan

Published: 2025, Last Modified: 07 Jan 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multi-modal entity alignment (EA) is an important extension of the EA task. Existing multi-modal EA methods use randomly generated embeddings to supplement the missing image features of entities, ignoring the impact of correlation between entities and images. In this paper, we propose IVMEA, a multi-modal EA framework under imbalanced visual modality information. Specifically, IVMEA first establishes a mapping network from semantic features to image features, and then completes the training of the mapping network in terms of both alignment loss and cross-modal loss to obtain semantic-aware image embeddings for multi-modal EA. Finally, the multi-modal EA task is completed based on the similarities of the multi-modal feature embeddings. Experimental results show that our proposed IVMEA achieves the state-of-the-art (SOTA) performance on three cross-lingual datasets with limited parameters and runtime in various imbalanced visual modality scenarios.

External IDs:dblp:conf/icassp/ZhangLS25