A rotation robust shape transformer for cartoon character recognition

Qi Jia, Xinyu Chen, Yi Wang, Xin Fan, Haibin Ling, Longin Jan Latecki

Published: 01 Jan 2024, Last Modified: 20 Jul 2025Vis. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recognizing cartoon characters accurately is important for animators to design and create cartoon scenarios by utilizing existing cartoon materials. Current deep learning approaches are sensitive to image rotation and heavily rely on rich textures that rarely exist in cartoon figures. In order to address this problem, the focus of our work is on the distinct nature of shapes, which mostly encodes the geometric structure of contours, rendering more discriminative and robust features than textures. We propose a rotation robust shape transformer for cartoon character recognition. As the filters in deep learning hardly detect discriminative gradient information in cartoon figures, we leverage multi-scale shape context (SC) to obtain the geometry of contour sampling points other than differences in gray level. Further, we propose a rotation-invariant positional encoding to depict the geometric relations of local shape features. The contributions of the different scales of SC templates are learned by attention-based transformer encoder. The obtained network is able to learn shape information effectively from cartoon contours only. The simplistic design attains surprisingly nearly 100% recognition accuracy, which beats both handcrafted and deep learning methods on the proposed challenging Cartoon dataset and traditional datasets. In particular, we gain 86.19% recognition accuracy on rotation test set, rendering an overwhelming superiority of 58.30 percentage higher than the state-of-the-art methods. Moreover, we develop an online cartoon character recognition application for animation scenarios.