ETA: Enriching Typos Automatically from Real-World Corpora for Few-Shot Learning

Anonymous

ETA: Enriching Typos Automatically from Real-World Corpora for Few-Shot Learning

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Spell checking is the task of rectifying errors in a sentence resulting from various factors, and despite continuous research in this field, research often focused on widely known specific languages. In this study, we focus on the Korean language and its linguistic characteristics, particularly the propensity for a single character can be incorrect in diverse ways. Therefore, we categorize spelling errors from real-world corpora and automatically construct an error corpus based on their statistical patterns. When we employed them to leverage the impact of a pre-trained large language model (LLM), we confirm that utilizing the introduced spelling errors as samples for few-shot learning can be helpful in error correction tasks. We hope that this study contributes to the automatic construction of error corpora and prompt-based approaches for other low-resource languages.

Paper Type: short

Research Area: NLP Applications

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: Korean

0 Replies

Loading