BED: Boundary-Enhanced Decoder for Chinese Word SegmentationDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: Chinese Word Segmentation, deep learning, nature language processing
TL;DR: An optimized decoder for the CWS model called Boundary-Enhanced Decoder.
Abstract: Chinese Word Segmentation (CWS) is an essential fundamental step in the Chinese NLP processing pipeline. In recent years, with the development of deep learning and pre-training language models, many CWS models based on pre-training models, e.g., BERT and Roberta, have been proposed, and the performance of CWS models has been dramatically improved. However, CWS remains an open problem that deserves further study, such as the poor effect on OOV words. To our knowledge, the current proposed CWS approaches mainly focus on optimizing the encoder part of the model, such as incorporating more word information into the encoder or doing pre-training related to the CWS task, etc. And there is no attempt to improve the decoder's performance in the CWS model. This paper proposes an optimized decoder for the CWS model called Boundary-Enhanced Decoder (BED). It could bring 0.05% and 0.69% improvement on Average-F1 and OOV Average-F1 on four benchmark datasets when using a model with a BERT encoder and softmax standard decoder. We also publish our implementation of BED.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Supplementary Material: zip
5 Replies

Loading