Keywords: Grammatical Error Correction, Revision, Resource, Evaluation
Abstract: Natural language processing (NLP) technology has rapidly improved automated grammatical error correction (GEC) tasks, and the GEC community has begun to explore document-level revision. There are two major obstacles to going beyond automated sentence-level GEC to NLP-based document-level revision support,: (1) there are few public corpora with document-level revisions annotated by professional editors, and (2) it is infeasible to obtain all possible references and evaluate revision quality using such references because there are infinite revision possibilities. To address these challenges, this paper proposes a new document revision corpus, i.e., Text Revision of ACL papers (TETRA), in which professional editors have revised academic papers sampled from the ACL anthology that contain trivial grammatical errors. This corpus enables us to focus on document-level and paragraph-level edits, e.g., edits related to coherence and consistency. In addition, we investigate reference-less and interpretable methods for meta-evaluation to detect quality improvements according to document revisions. We show the uniqueness of TETRA compared with existing document revision corpora and demonstrate that a fine-tuned pre-trained language model can discriminate the quality of documents after revision even when the difference is subtle.
Paper Type: long
Research Area: Resources and Evaluation
0 Replies
Loading