Diachronic Named Entity Disambiguation for Ancient Chinese Historical RecordsOpen Website

Published: 01 Jan 2023, Last Modified: 25 Mar 2024ICONIP (11) 2023Readers: Everyone
Abstract: Named entity disambiguation (NED) is a fundamental task in NLP. Although numerous methods have been proposed for NED in recent years, they ignore the fact that a lot of real-world corpora are diachronic by nature, such as historical documents or news articles, which vary greatly in time. As a consequence, most current methods fail to fully exploit the temporal information inside the corpora and knowledge bases. To address the issue, we propose a novel model which integrates temporal feature into pretrained language model to make our model aware of time and a new sample re-weighting scheme for diachronic NED which penalizes highly-frequent mention-entity pairs to improve performance on rare and unseen entities. We present WikiCMAG and WikiSM, two new NED datasets annotated on ancient Chinese historical records. Experiments show that our model outperforms existing methods by large margins, proving the effectiveness of integrating diachronic information and our re-weighting schema. Our model also gains competitive performance on out-of-distribution (OOD) settings. WikiSM is publicly available at https://github.com/PKUDHC/WikiSM .
0 Replies

Loading