Improving Coreference Resolution through Prompting-based Adversarial Filtering and Data Augmentation

Improving Coreference Resolution through Prompting-based Adversarial Filtering and Data Augmentation

ACL ARR 2025 February Submission640 Authors

10 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Coreference resolution is a fundamental task in natural language processing that involves linking different references to the same entity within a text. However, models often struggle to reliably identify referential relationships, particularly in cases involving long contexts or complex modifiers. To address these challenges, this study introduces a data augmentation technique that incorporates adjectival phrases and employs a Prompting-based Adversarial Filtering pipeline. Specifically, we generated and inserted contextually appropriate adjective phrases through the interaction between GPT-4o-mini based Few-shot Prompting and a Discriminative Language Model. These augmentations were then verified for grammaticality and contextual coherence through human evaluation. The resulting synthetic dataset was integrated with the original data to enhance the performance of coreference resolution. Training real-world models with the synthetic dataset led to up to a 1.2% improvement in CoNLL-F1 on the LitBank dataset and up to a 0.4% improvement on the PreCo dataset. Furthermore, the synthetic dataset significantly increased the diversity and complexity of coreference relations. The proposed pipeline represents an important step towards developing coreference resolution models that better capture the linguistic diversity of natural language and demonstrate robustness under challenging conditions.

Paper Type: Long

Research Area: Semantics: Lexical and Sentence-Level

Research Area Keywords: multi-word expressions

Contribution Types: Data resources

Languages Studied: English

Submission Number: 640

Loading