Context Beyond Grammar: Synonym Substitution for Korean Grammatical Error Correction in Specialized Texts

Context Beyond Grammar: Synonym Substitution for Korean Grammatical Error Correction in Specialized Texts

ACL ARR 2024 December Submission1045 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Many previous studies on grammatical error correction (GEC) have primarily focused on language learner corpora, which consist of texts written by learners acquiring a non-native language. In this study, we address a GEC task that involves selecting contextually appropriate words in texts containing domain-specific vocabulary. We propose the UniGEC (Unified-Replacement GEC) dataset, which combines results from multiple models to determine the likelihood of substituting synonyms for specific keywords, based on token occurrence probabilities. Our experiments show that the UniGEC presents a more challenging task compared to language learner corpora. We observed that as the number of synonyms increases, the performance gap widens. Furthermore, we found significant performance variations across different domains, highlighting the need for further exploration of synonym substitution in specialized texts to expand the applicability of GEC tasks to a wider range of scenarios.

Paper Type: Short

Research Area: NLP Applications

Research Area Keywords: Grammatical Error Correction, Synonym, Context

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: Korean

Submission Number: 1045

Loading