Anime Character Identification and Tag Prediction by Multimodality Modeling: Dataset and Model

Published: 01 Jan 2023, Last Modified: 11 Nov 2024IJCNN 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, some advances have been achieved in classification and object detection related to animation. However, these works do not take full advantage of the tags and text description content attached to the anime data when they are created, which restricts both the related methods and data to unimodality, consequently leading to unsatisfactory performance. In this paper, we propose a novel multimodal deep learning network for Anime character identification and tag prediction by exploiting multimodal data. Considering that in many realistic scenarios, text annotations accompanying anime may be missing, we introduce the concept of curriculum learning in transformers to enable inference with only one modality. Another challenge lies in that the existing dataset does not meet our demand for large-scale multimodal deep learning. To train the proposed network, we construct a new anime dataset Dan: mul that contains over 1.6M images spread across more than 14K categories, with an average of 24 tags per image. To the best of our knowledge, this is the first dataset specifically designed for multimodal anime character identification. With the trained network, we can identify the anime characters in images and generate the related tags. Experiments show that our method achieves state-of-the-art performance on Dan: mul in animation identification.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview