Detecting Anomalies on Texts using GPT-2

Anonymous

Detecting Anomalies on Texts using GPT-2

Anonymous

03 Sept 2022 (modified: 05 May 2023)ACL ARR 2022 September Blind SubmissionReaders: Everyone

Abstract: Large neural network language models trained on huge corpora of text have achieved state-of-the-art results on several natural language tasks. Using the pre-trained language model GPT-2, we propose algorithms for grammar error detection (GED). Our approach frames the GED problem as an anomaly detection problem and requires no additional training data. We leverage the next-word probability, word-embeddings from GPT-2 to detect anomalous sentences, and evaluate the result on the English learners’ corpora, Lang-8, CoNLL-2014, FCE, and BEA-2019 . Our methods achieve a competitive area under the receiver operating characteristic (AUROC) on the English learners' corpora when detecting ungrammatical sentences. An experimental comparison of normalization methods shows that rule-driven methods are the most effective.

Paper Type: long

0 Replies

Loading