Abstract: Large neural network language models trained on huge corpora of text have achieved state-of-the-art results on several natural language tasks. Using the pre-trained language model GPT-2, we propose algorithms for grammar error detection (GED). Our approach frames the GED problem as an anomaly detection problem and requires no additional training data. We leverage the next-word probability, word-embeddings from GPT-2 to detect anomalous sentences, and evaluate the result on the English learners’ corpora, Lang-8, CoNLL-2014, FCE, and BEA-2019 . Our methods achieve a competitive area under the receiver operating characteristic (AUROC) on the English learners' corpora when detecting ungrammatical sentences. An experimental comparison of normalization methods shows that rule-driven methods are the most effective.
Paper Type: long
0 Replies
Loading