Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

ACL ARR 2026 May Submission14397 Authors

26 May 2026 (modified: 02 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: coreference resolution, multilingual coreference resolution, back-translation
Abstract: Coreference resolution is a core NLP task, having a broad range of downstream applications, e.g. machine translation, question answering, document summarization, etc. While the task is well-studied in English, comparatively less attention is dedicated to coreference resolution in other languages, especially low-resource ones. To mitigate this gap, we propose a novel coreference resolution pipeline that harnesses machine translation (MT) from English to a target low-resource language, to generate or expand training data. To automatically validate the quality of the translated samples, we back-translate the samples and assess the similarity with the original English samples via cosine similarity in the latent space of a BERT model. The resulting similarity scores are integrated into the loss function to weight training samples according to their MT cycle consistency. Extensive experiments on four low-resource languages show that our pipeline brings significant performance gains in coreference resolution. Moreover, our pipeline enables accurate coreference resolution in languages where no previous corpora were available. We publicly release our code at https://anonymous.4open.science/r/NewCoref-D3B8/.
Paper Type: Short
Research Area: Discourse and Pragmatics
Research Area Keywords: coreference resolution
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources
Languages Studied: French, Hungarian, Romanian, Russian
EMNLP 2026 AI Reviewing Experiment: no
Submission Number: 14397
Loading