A Cross-Document Coreference Resolution Approach to Low-Resource LanguagesOpen Website

Published: 01 Jan 2023, Last Modified: 13 Dec 2023KSEM (2) 2023Readers: Everyone
Abstract: Coreference resolution is an important area of research in natural language processing that deals with the task of identifying and grouping all the expressions in a text that refer to the same entity. This work presents a system to improve and develop a coreference resolution model for Thai language, based on the existing English clustering-based model. Specifically, we introduce a method to convert Thai text into ECB + -equivalent datasets, which can be used as benchmark for the Thai language. This paper follows an existing model trained for English coreference resolution which uses agglomerative clustering to segment clusters of coreference entities across document. The model trained and evaluated using our data achieves the best CoNLL F1 score of 72.87. Finally, we present a comparative study of the effect of manual and automatic span extractors on Thai language model performance. The results of our study indicate that our proposed pipeline, which utilizes the fine-tuned longformer model as the encoder, offers a viable alternative to more complex and resource-intensive methods. Our work also suggests that the use of existing NER and entity recognizer models can help automate span annotation prior to the subsequent conference clustering module. This study offers a potential framework for the construction of coreference resolution models in other low-resource languages.
0 Replies

Loading