Abstract: Spell checking is the task of rectifying errors in a sentence resulting from various factors, and despite continuous research in this field, research often focused on widely known specific languages. In this study, we focus on the Korean language and its linguistic characteristics, particularly the propensity for a single character can be incorrect in diverse ways. Therefore, we categorize spelling errors from real-world corpora and automatically construct an error corpus based on their statistical patterns. When we employed them to leverage the impact of a pre-trained large language model (LLM), we confirm that utilizing the introduced spelling errors as samples for few-shot learning can be helpful in error correction tasks. We hope that this study contributes to the automatic construction of error corpora and prompt-based approaches for other low-resource languages.
Paper Type: short
Research Area: NLP Applications
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Korean
0 Replies
Loading