Keywords: entity alignment, knowledge graphs, long-tailed entities, contrastive learning
TL;DR: Contrastive learning based cross lingual entity alignment
Abstract: Entity alignment (EA) models rely mostly on triples and structural information of Knowledge Graphs (KGs), but underperform on sparsely connected long-tailed entities. We address this gap by proposing a model, \textbf{ContrastEA}, that leverages pre-trained Language Models (LM), e.g. me5, to generate entity representations, followed by a novel contrastive learning approach that incorporates hard-negative mining strategies with \textit{top-k} negatives per entity, alongside NT-Xent loss to separate challenging entity pairs. In addition, to address the under-representation of long-tailed entities in benchmark datasets, we curate a new dataset from DBpedia comprising long-tailed entities per language -- Arabic, German, Portuguese, Italian, Hindi, Russian, and Japanese, each aligned to English (a total of 154,296 cross-lingual entity pairs). Our results demonstrate that ContrastEA outperforms the classic EA models on three benchmark datasets, improving Hits@1 by 6--20 percentage points, and achieves SOTA on the curated dataset over the long-tailed EA models.
Supplementary Material: zip
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 17566
Loading