Abstract: Grammatical error correction (GEC) aims to automatically detect and correct grammatical errors in sentences. With the development of deep learning, neural machine translation-based approach becomes the mainstream approach for this task. Recently, Chinese GEC attracts a certain amount of attention. However, Chinese GEC has two main problems that limit model learning: (1) insufficient data; (2) flexible error forms. In this paper, we attempt to address these limitations by proposing a method called online self-boost learning for Chinese GEC. Online self-boost learning enables the model to generate multiple instances with different errors for model’s weaknesses from each original sample within each batch and to learn the new data in time without additional I/O. And taking advantage of the features of the new data, a consistency loss is introduced to drive the model to produce similar distributions for different inputs with the same target. Our method is capable of fully exploiting the potential knowledge of the annotated data. Meanwhile, it allows for the use of unlabeled data to extend to a semi-supervised method. Sufficient experiments and analyses show the effectiveness of our method. Besides, our method achieves a state-of-the-art result on the Chinese benchmark.
Loading