WDEA: The Structure and Semantic Fusion With Wasserstein Distance for Low-Resource Language Entity Alignment

Ran Song, Xiang Huang, Hao Peng, Shengxiang Gao, Zhengtao Yu, Philip S. Yu

Published: 01 Jan 2024, Last Modified: 13 Oct 2025IEEE ACM Trans. Audio Speech Lang. Process. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Entity Alignment (EA) aims to identify pairs of entities from two distinct language knowledge graphs (KGs) that represent the same real-world objects. Current EA methods have exhibited impressive performance by leveraging both structural and semantic information. However, these approaches often falter when confronted with EA in low-resource languages. The primary challenge is that low-resource language KGs have sparse graph structures, resulting in difficulty in obtaining accurate entity representations. High-quality entity representation is the key to improving EA performance. Therefore, we propose augmenting entity representations with additional features derived from within the graph. In this paper, we introduce a novel approach: Structure and Semantic Fusion with WD for Low-Resource Language Entity Alignment (WDEA). Our method integrates structural and semantic information using the Wasserstein Distance. Specifically, we design a Wasserstein Graph Convolutional Network (WGCN), a GNN-based model that integrates multi-hop information using a message passing mechanism with WD. Additionally, our method adapts the semantic information from the pre-trained language model in the Wasserstein space to facilitate smooth integration. We also propose the Wasserstein Fusion Encoder (WFE), which effectively combines structural and semantic information in the Wasserstein space. To validate the efficacy of our proposed method, we construct low-resource language EA datasets, encompassing uncommon linguistic varieties with sparser structures compared to mainstream datasets. Experimental results show the superiority of our approach, demonstrating significant performance enhancements in low-resource language EA compared to prevailing baseline models across various information configurations.