Self-Adapted Entity-Centric Data Augmentation for Discontinuous Named Entity Recognition

ACL ARR 2024 June Submission3624 Authors

16 Jun 2024 (modified: 24 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Named Entity Recognition (NER) is a critical task in natural language processing, particularly challenging in identifying discontinuous entities. This study is the first to explore the application of image data augmentation techniques in the preprocessing phase for discontinuous entity recognition, aiming to overcome the limitations of traditional text segmentation methods. Through experiments, we found that traditional sentence segmentation methods might lead to incorrect segmentation of cross-sentence discontinuous entities, affecting the accuracy of model training and entity recognition. To address this, we introduced a new preprocessing strategy that combines graphic cropping, scaling, and padding techniques to improve the model's ability to recognize discontinuous entities. Experiments on three benchmark datasets, CADEC, ShARe13, and ShARe14, demonstrated that our preprocessing method increased the F1 scores of two state-of-the-art grid models by approximately 1\% to 2.5\%, proving the effectiveness of this method.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: Discontinuous entity recognition, grid-tagging, grid preprocessing
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 3624
Loading