Spelling Corrector Is Just Language LearnerDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: Learn self-supervised spelling correction on monolingual data alone.
Abstract: This paper studies spelling correction of purely unsupervised learning, which meanings there are no annotated errors within the training data, a pivotal issue that has received broad attention in the community. Our intuition is that humans are naturally good correctors with almost no exposure to parallel sentences, which contrasts to current unsupervised methods that are strongly reliant on the usage of confusion sets. In this paper, we demonstrate that learning a spelling correction model is identical to learning a language model from monolingual data alone, with decoding it in a greater search space. We propose \emph{Denoising Decoding Correction (D\textsuperscript{2}C)}, which selectively imposes noise upon the source sentence to solve out the underlying correct characters. Our method largely inspires the ability of language models to perform correction, including both BERT-based models and large language models (LLMs), and unlocks significant performances on Chinese and Japanese spelling correction benchmarks. We also show that this self-supervised learning manner generally outstrips using confusion sets, because it bypasses the need of introducing error characters to the training data which can impair the patterns in the target domains.
Paper Type: long
Research Area: NLP Applications
Contribution Types: Approaches to low-resource settings
Languages Studied: Chinese, Japanese
0 Replies

Loading