Abstract: Cartoon parsing is an important task for cartoon-centric applications, which segments the body parts of cartoon images. Due to the complex appearances, abstract drawing styles, and irregular structures of cartoon characters, cartoon parsing remains a challenging task. In this paper, a novel approach, named CartoonNet, is proposed for cartoon parsing, in which semantic consistency and structure correlation are integrated to address the visual diversity and structural complexity for cartoon parsing. A memory-based semantic consistency module is designed to learn the diverse appearances exhibited by cartoon characters. The memory bank stores features of diverse samples and retrieves the samples related to new samples for consistency, which aims to improve the semantic reasoning capability of the network. A self-attention mechanism is employed to conduct consistency learning among diverse body parts belong to the retrieved samples and new samples. To capture the intricate structural information of cartoon images, a structure correlation module is proposed. Leveraging graph attention networks and a main body-aware mechanism, the proposed approach enables structural correlation, allowing it to parse cartoon images with complex structures. Experiments conducted on cartoon parsing and human parsing datasets demonstrate the effectiveness of the proposed method, which outperforms the state-of-the-art approaches for cartoon parsing and achieves competitive performance on human parsing.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Media Interpretation, [Experience] Multimedia Applications, [Experience] Art and Culture
Relevance To Conference: Cartoon characters, as an expressive and creative visual form, are closely related to the research and applications of multimedia and multimodal analysis. Specifically, cartoon characters are an essential component of various multimedia applications, such as the metaverse, animated films, virtual reality, music games, and artistic creations. Cartoon parsing, as a research field, can play a crucial role in advancing these applications and driving the progress of the related domains.
Furthermore, the algorithms and techniques developed for cartoon parsing can be closely associated with multimodal research, such as Referring Image Segmentation. These synergies can provide valuable theoretical and algorithmic foundations for the multimodal analysis of complex and semantically rich visual content.
The interdisciplinary and cross-media nature of cartoon parsing research presents unique opportunities for innovation and integration. By leveraging the complementary strengths of multimedia and multimodal analysis, cartoon parsing can contribute to the construction of more intelligent, interactive, and comprehensive information service systems of the future. This convergence of research fields has the potential to drive significant advancements in the understanding, generation, and utilization of expressive and creative visual content, ultimately shaping the evolving landscape of media and communication technologies.
Submission Number: 719
Loading