Cross Lingual Long-tailed Entity Alignment in Knowledge Graphs

Cross Lingual Long-tailed Entity Alignment in Knowledge Graphs

ICLR 2026 Conference Submission17566 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: entity alignment, knowledge graphs, long-tailed entities, contrastive learning

TL;DR: Contrastive learning based cross lingual entity alignment

Abstract: Entity alignment (EA) models rely mostly on triples and structural information of Knowledge Graphs (KGs), but underperform on sparsely connected long-tailed entities. We address this gap by proposing a model, \textbf{ContrastEA}, that leverages pre-trained Language Models (LM), e.g. me5, to generate entity representations, followed by a novel contrastive learning approach that incorporates hard-negative mining strategies with \textit{top-k} negatives per entity, alongside NT-Xent loss to separate challenging entity pairs. In addition, to address the under-representation of long-tailed entities in benchmark datasets, we curate a new dataset from DBpedia comprising long-tailed entities per language -- Arabic, German, Portuguese, Italian, Hindi, Russian, and Japanese, each aligned to English (a total of 154,296 cross-lingual entity pairs). Our results demonstrate that ContrastEA outperforms the classic EA models on three benchmark datasets, improving Hits@1 by 6--20 percentage points, and achieves SOTA on the curated dataset over the long-tailed EA models.

Supplementary Material: zip

Primary Area: learning on graphs and other geometries & topologies

Submission Number: 17566

Loading