ETA: Enriching Typos Automatically from Real-World Corpora for Few-Shot LearningDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Spell checking is the task of rectifying errors in a sentence resulting from various factors, and despite continuous research in this field, research often focused on widely known specific languages. In this study, we focus on the Korean language and its linguistic characteristics, particularly the propensity for a single character can be incorrect in diverse ways. Therefore, we categorize spelling errors from real-world corpora and automatically construct an error corpus based on their statistical patterns. When we employed them to leverage the impact of a pre-trained large language model (LLM), we confirm that utilizing the introduced spelling errors as samples for few-shot learning can be helpful in error correction tasks. We hope that this study contributes to the automatic construction of error corpora and prompt-based approaches for other low-resource languages.
Paper Type: short
Research Area: NLP Applications
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Korean
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview