DRA: A Dual Retrieval Architecture for Domain Chinese Spelling Check

Published: 2025, Last Modified: 12 Jan 2026NLPCC (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Chinese Spelling Check (CSC), a foundational task in natural language processing and Chinese Computing, aims to detect and correct misspelled characters in Chinese texts. However, existing CSC methods suffer from the challenge of domain adaptation, for which the supervised learning approaches require large amounts of labeled data. LLM-based methods alleviate this problem through in-context learning (ICL) in the few-shot setting, but struggle to generalize across domain-specific tasks due to a lack of domain knowledge. To address these limitations, this paper proposes a novel dual retrieval architecture (DRA) for domain-specific CSC. Unlike existing LLM-based methods that rely on task examples, DRA integrates two core components: (1) a robust retriever to extract contextually relevant domain knowledge from external document corpora, and (2) an example retriever to provide correction pattern guidance. In the presence of input sentences with misspelled characters, the robust retriever mitigates retrieval errors via two synergistic strategies: (i) multi-modal modeling of phonetic, glyphic, and semantic features to link characters with plausible candidates; (ii) confusion-set augmented training to enhance robustness against error patterns. Extensive experiments on three domain-specific CSC benchmarks (LAW, MED, and ODW) demonstrate the effectiveness of DRA. It achieves correction F1 scores of 86.6%, 77.2%, and 93.1%, surpassing previous state-of-the-art methods by significant margins. Ablation studies confirm the critical role of the robust retriever in enhancing contextual accuracy and reducing dependency on domain-specific annotations.
Loading