Pre-Training with Syntactic Structure Prediction for Chinese Semantic Error RecognitionDownload PDF


16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Existing Chinese text error detection mainly focuses on spelling errors and simple grammatical errors. These errors have been studied extensively and are relatively simple for humans. Chinese Semantic Error Recognition (CSER) pays attention to more complex semantic errors that humans cannot easily recognize compared with Chinese text error detection. Considering the complex syntactic relation between words, we find that syntactic structure from the syntax tree can help identify semantic errors. In this paper, we consider adopting the pre-trained models to solve the task of CSER. To make the model learn syntactic structure in the pre-training stage, we designed a novel pre-training task to predict the syntactic structure from the syntax tree between different words. Due to the lack of a published dataset for CSER, we build a high-quality dataset for CSER for the first time named Corpus of Chinese Linguistic Semantic Acceptability (CoCLSA), which is extracted from the high school examinations. The experimental results on the CoCLSA show that our pre-trained model based on the new pre-training task has a positive performance compared with existing pre-trained models.
0 Replies
