Self-learning Compositional Representations for Zero-shot Chinese Character Recognition

Fan Shi; Haiyang Yu; Bin Li; Xiangyang Xue

Self-learning Compositional Representations for Zero-shot Chinese Character Recognition

Fan Shi, Haiyang Yu, Bin Li, Xiangyang Xue

27 Sept 2024 (modified: 15 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Chinese Character Recognition, Object-centric Representations

Abstract: Chinese character recognition has been a longstanding research topic and remains essential in visual tasks like ancient manuscript recognition. Chinese character recognition faces numerous challenges, particularly the issue of zero-shot characters. Existing Chinese zero-shot character recognition methods primarily focus on the radical or stroke decomposition. However, radical-based methods still struggle to solve zero-shot radicals, while stroke-based ones are hard to perceive fine-grained information. Besides, previous methods can hardly generalize to characters of other languages. In this paper, we propose a novel Self-learning Compositional Representation method for zero-shot Chinese Character Recognition (SCR-CCR). SCR-CCR learns compositional components automatically from the data, which are not aligned with human-defined radical or stroke decomposition methods. SCR-CCR follows the pretraining-inference paradigm. First, we train a Character Slot Attention (ChSA) via pure feature reconstruction loss to parse appropriate components from character images. Then we recognize zero-shot characters without finetuning or retraining in the inference stage by comparing components between input and example images. To evaluate the proposed method, we conduct experiments of zero-shot character recognition. The experiments illustrate that SCR-CCR outperforms previous methods in most cases of character and radical zero-shot settings. In particular, visualization experiments indicate that the components learned by SCR-CCR reflect the structure of characters in an interpretable way, and can be used to recognize Japanese and Korean characters.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11741

Loading