Abstract: Multi-modal entity alignment (EA) is an important extension of the EA task. Existing multi-modal EA methods use randomly generated embeddings to supplement the missing image features of entities, ignoring the impact of correlation between entities and images. In this paper, we propose IVMEA, a multi-modal EA framework under imbalanced visual modality information. Specifically, IVMEA first establishes a mapping network from semantic features to image features, and then completes the training of the mapping network in terms of both alignment loss and cross-modal loss to obtain semantic-aware image embeddings for multi-modal EA. Finally, the multi-modal EA task is completed based on the similarities of the multi-modal feature embeddings. Experimental results show that our proposed IVMEA achieves the state-of-the-art (SOTA) performance on three cross-lingual datasets with limited parameters and runtime in various imbalanced visual modality scenarios.
External IDs:dblp:conf/icassp/ZhangLS25
Loading