Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters

Anonymous

Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Writing assistance aims to improve the correctness and quality of input texts, with character checking being crucial in detecting and correcting wrong characters. In the real world where handwriting occupies the vast majority, characters that humans get wrong include faked characters (i.e., untrue characters created due to writing errors) and misspelled characters (i.e., true characters used incorrectly due to spelling errors). However, existing datasets and related studies only focus on misspelled characters that can be represented by computer text encoding systems, thereby ignoring faked characters which are more common and difficult. To break through this dilemma, we present $\textbf{Visual-C}$$^3$, a human-annotated $\textbf{Visual}$ $\textbf{C}$hinese $\textbf{C}$haracter $\textbf{C}$hecking dataset with faked and misspelled Chinese characters. To the best of our knowledge, Visual-C$^3$ is the first real-world visual and the largest human-crafted dataset for the Chinese character checking scenario. Additionally, we also propose and evaluate novel baseline methods on Visual-C$^3$. Extensive empirical results and analyses show that Visual-C$^3$ is high-quality yet challenging. As the first study focusing on Chinese faked characters, the Visual-C$^3$ dataset and the baseline methods will be publicly available to facilitate further research in the community.

Paper Type: long

Research Area: Resources and Evaluation

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources

Languages Studied: Chinese

0 Replies

Loading