SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction

ACL ARR 2025 February Submission5507 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Joint Entity and Relation Extraction (JERE) is highly susceptible to weak generalization due to low-quality training data. Data augmentation is a common strategy to enhance model generalization across different domains. However, existing data augmentation methods often overlook text relevance and may disrupt semantic structures and dependencies, making it difficult to generate effective augmented data for improving model generalization. In this paper, we propose \textbf{Structured Semantic Data Augmentation (SSDAU)}, a novel method designed to preserve the semantic structure of text during augmentation. SSDAU segments text based on entity labels and employs an encoder to capture semantic features of entities through context awareness. It then performs entity semantic restructuring to generate augmented data. To mitigate potential topic ambiguity and information loss, we apply the BERTTopic model to filter out irrelevant topics, ensuring topic consistency. We evaluate SSDAU on datasets with different annotation types and compare its performance on five representative JERE models against six popular data augmentation baselines. Extensive experiments demonstrate that SSDAU generates data with a consistent semantic structure, leading to improved JERE model performance and surpassing state-of-the-art baselines.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: data augmentation
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Submission Number: 5507
Loading