Context Beyond Grammar: Synonym Substitution for Korean Grammatical Error Correction in Specialized Texts
Abstract: Many previous studies on grammatical error correction (GEC) have primarily focused on language learner corpora, which consist of texts written by learners acquiring a non-native language. In this study, we address a GEC task that involves selecting contextually appropriate words in texts containing domain-specific vocabulary. We propose the UniGEC (Unified-Replacement GEC) dataset, which combines results from multiple models to determine the likelihood of substituting synonyms for specific keywords, based on token occurrence probabilities. Our experiments show that the UniGEC presents a more challenging task compared to language learner corpora. We observed that as the number of synonyms increases, the performance gap widens. Furthermore, we found significant performance variations across different domains, highlighting the need for further exploration of synonym substitution in specialized texts to expand the applicability of GEC tasks to a wider range of scenarios.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: Grammatical Error Correction, Synonym, Context
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: Korean
Submission Number: 1045
Loading