Enhancing Coreference Resolution with LLM-driven Data Augmentation and Adversarial Filtering

Enhancing Coreference Resolution with LLM-driven Data Augmentation and Adversarial Filtering

ACL ARR 2025 July Submission61 Authors

21 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Coreference resolution is a fundamental task in natural language processing that involves linking different references to the same entity within a text. However, existing models often struggle to reliably identify referential relationships in contexts with extensive length or complex modifiers. This study proposes a data augmentation technique adding adjective phrases and employing a prompt-based adversarial filtering pipeline to address these challenges. Specifically, we generated and inserted contextually appropriate adjective phrases through the interaction between GPT-4o-mini based Few-shot Prompting and a Discriminative Language Model. The grammatical and semantic consistency of these phrases was validated via human evaluation and inter-annotator agreement (IAA) procedures. The generated synthetic dataset was integrated with existing data, leading to enhanced model performance. On the LitBank dataset, the CoNLL-F1 score increased by up to 2.4%, while the synthetic dataset improved linguistic diversity and the complexity of referential structures. The proposed pipeline represents a significant step towards developing coreference resolution models capable of better capturing linguistic variety and demonstrating robustness under challenging conditions.

Paper Type: Long

Research Area: Discourse and Pragmatics

Research Area Keywords: coreference resolution

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 61

Loading