Distill and Calibrate: Denoising Inconsistent Labeling Instances for Chinese Named Entity RecognitionDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Data-driving supervised models for named entity recognition (NER) have made significant improvements on standard benchmarks. However, such models often have severe performance degradation on large-scale noisy data. Thus, a practical and challenging question arises: Can we leverage only a small amount of relatively clean data to guide the NER model learning from large-scale noisy data? To answer this question, we focus on the inconsistent labeling instances problem. We observe that inconsistent labeling instances can be classified into five types of noise, each of which will largely hinder the model performance in our experiments. Based on the above observation, we propose a simple yet effective denoising framework named Distillation and Calibration for Chinese NER (DCNER). DCNER consists: (1) a Dual-stream Label Distillation mechanism for distilling five types of inconsistent labeling instances from the noisy data; and (2) a Consistency-aware Label Calibration network for calibrating inconsistent labeling instances based on relatively clean data. Additionally, we propose the first benchmark towards validating the ability of Chinese NER to resist inconsistent labeling instances. Finally, detailed experiments show that our method consistently and significantly outperforms previous methods on the proposed benchmark.
0 Replies

Loading