Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

Anonymous

Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: In this paper, we investigate GEC sequence tagging architecture with focusing on ensembling of the recent cutting-edge Transformers’ encoders in their Large configurations. We encourage ensembling models by majority votes on span-level edits because it's tolerant to the model architecture and vocabulary size. Our best ensemble achieves a new SOTA result, the F_0.5 score of 76.05 on BEA-2019 (test), even without pre-training on synthetic datasets. Also, we perform model distillation of a trained ensemble to generate new training synthetic datasets, "Troy-Blogs" and "Troy-1BW". Our best single sequence tagging model that is pretrained on generated Troy- datasets in combination with publicly available synthetic PIE dataset achieves a near-SOTA result of the F_0.5 score of 73.21 on BEA-2019 (test). The code, datasets, and trained models are publicly available.

0 Replies

Loading