Self-Adapted Entity-Centric Data Augmentation for Discontinuous Named Entity Recognition

Abstract: Named Entity Recognition (NER) is a critical task in natural language processing, particularly challenging in identifying discontinuous entities. This study is the first to explore the application of image data augmentation techniques in the preprocessing phase for discontinuous entity recognition, aiming to overcome the limitations of traditional text segmentation methods. Through experiments, we found that traditional sentence segmentation methods might lead to incorrect segmentation of cross-sentence discontinuous entities, affecting the accuracy of model training and entity recognition. To address this, we introduced a new preprocessing strategy that combines graphic cropping, scaling, and padding techniques to improve the model's ability to recognize discontinuous entities. Experiments on three benchmark datasets, CADEC, ShARe13, and ShARe14, demonstrated that our preprocessing method increased the F1 scores of two state-of-the-art grid models by approximately 1\% to 2.5\%, proving the effectiveness of this method.
