Abstract: Entity Disambiguation (ED) resolves ambiguous mentions in text by linking them to entities in a knowledge base. A key challenge in ED is entity overshadowing, where dominant entities obscure the correct choice. We propose RAG-ED (Retrieval-Augmented Generation for Entity Disambiguation), a data efficient three-stage pipeline consisting of a lightweight retriever, reranker, and a strong large language model based selector. RAG-ED achieves state-of-the-art performance on entity overshadowing cases, outperforming prior methods by 17 points. Additionally, the pipeline can also maintain competitive performance across standard ED benchmarks, demonstrating its broad applicability. A key advantage of RAG-ED is its ability to identify instances where disambiguation should not be performed, which is particularly useful in settings relying on lightweight retrievers. We conduct extensive analyses and ablation studies on diverse ED datasets further highlighting the effectiveness of our approach.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: entity disambiguation, entity overshadowing, retrieval augmented generation, large language models
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English
Submission Number: 1525
Loading