WikiComments: Leveraging Revision Comments to Extract Annotated Grammatical Correction Data from Wikipedia
TL;DR: We develop WikiComments, a data extraction method which leverages the revision comments of Wikipedia edits to extract grammatical error correction training data.
Abstract: We develop WikiComments a data extraction method which leverages the revision comments of Wikipedia edits to extract grammatical error correction training data. WikiComments improves the previous Wikipedia extraction method by only extracting data which are explicitly grammatical in nature. Our method produces larger quantities of data --up to 143% more-- than existing benchmarks in languages such as German and Russian. We show that augmenting Korean training data with our extracted data leads to state-of-the-art results. Additionally, we show that augmenting minimal amounts of gold annotated data with WikiComments improves performance on up to 92% of German error types.
Paper Type: long
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources
Languages Studied: German, Korean, Russian
0 Replies
Loading