Improving Coreference Resolution through Prompting-based Adversarial Filtering and Data Augmentation
Abstract: Coreference resolution is a fundamental task in natural language processing that involves linking different references to the same entity within a text. However, models often struggle to reliably identify referential relationships, particularly in cases involving long contexts or complex modifiers. To address these challenges, this study introduces a data augmentation technique that incorporates adjectival phrases and employs a Prompting-based Adversarial Filtering pipeline. Specifically, we generated and inserted contextually appropriate adjective phrases through the interaction between GPT-4o-mini based Few-shot Prompting and a Discriminative Language Model. These augmentations were then verified for grammaticality and contextual coherence through human evaluation. The resulting synthetic dataset was integrated with the original data to enhance the performance of coreference resolution. Training real-world models with the synthetic dataset led to up to a 1.2% improvement in CoNLL-F1 on the LitBank dataset and up to a 0.4% improvement on the PreCo dataset. Furthermore, the synthetic dataset significantly increased the diversity and complexity of coreference relations. The proposed pipeline represents an important step towards developing coreference resolution models that better capture the linguistic diversity of natural language and demonstrate robustness under challenging conditions.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: multi-word expressions
Contribution Types: Data resources
Languages Studied: English
Submission Number: 640
Loading