CalligraphicOCR for Chinese Calligraphy Recognition

ACL ARR 2025 May Submission1536 Authors

17 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: With thousand years of history, calligraphy serve as one of the representative symbols of Chinese culture. Increasing works try to digitize calligraphy by recognizing the context of calligraphy for better preservation and propagation. However, previous works stick to isolated single character recognition, not only requires unpractical manual splitting into characters, but also abandon the enriched context information that could be supplementary. To this end, we construct the pioneering end-to-end calligraphy recognition benchmark dataset, this dataset is challenging due to both the visual variations such as different writing styles and the textual understanding such as the domain shift in semantics. We further propose CalligraphicOCR (COCR) equipped with calligraphic image augmentation and action-based corrector targeted at the challenging root of this setting. Experiments demonstrate the advantage of our proposed model over cutting-edge baselines, underscoring the necessity of introducing this new setting, thereby facilitating a solid precondition for protecting and propagating the already scarce resources.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: resources for less-resourced languages, less-resourced languages, endangered languages
Contribution Types: Approaches to low-resource settings, Data resources
Languages Studied: Chinese
Submission Number: 1536
Loading