Abstract: Cartoon animal parsing aims to segment the body parts such as heads, arms, legs and tails from cartoon animals. Different from previous parsing tasks, cartoon animal parsing faces new challenges, including irregular body structures, abstract drawing styles and diverse animal categories. Existing methods have difficulties when addressing these challenges caused by the spatial and structural properties of cartoon animals. To address these challenges, a novel spatial learning and structural modeling network, named CAPNet, is proposed for cartoon animal parsing. It aims to address the critical problems of spatial perception, structure modeling and spatial-structural consistency learning. A spatial-aware learning module integrates deformable convolutions to learn spatial features of diverse cartoon animals. The multi-task edge and center point prediction mechanism is incorporated to capture the intricate spatial patterns. A structural modeling method is proposed to model the complex structural representations of cartoon animals, which integrates a graph neural network with a shape-aware relation learning module. To mitigate the significant differences among animals, a spatial and structural consistency learning strategy is proposed to capture and learn feature correlations across different animal species. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed approach, which outperforms state-of-the-art methods.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Media Interpretation, [Experience] Art and Culture, [Experience] Multimedia Applications
Relevance To Conference: Cartoon characters, as a vibrant and imaginative visual medium, are intricately intertwined with multimedia and multimodal analysis research and applications. They serve as integral elements in various multimedia platforms, including the metaverse, animated films, virtual reality, music games, and artistic endeavors. Research in cartoon parsing holds the potential to significantly impact these applications and drive progress in related domains.
Moreover, the methodologies and tools developed for cartoon parsing can synergize with multimodal research areas like Referring Image Segmentation, offering valuable theoretical and algorithmic frameworks for analyzing complex visual content rich in semantics.
The interdisciplinary and cross-media nature of cartoon parsing research presents unique opportunities for innovation and integration. By capitalizing on the complementary strengths of multimedia and multimodal analysis, cartoon parsing can contribute to the development of more intelligent, interactive, and comprehensive information service systems in the future. This convergence of research fields has the potential to propel advancements in understanding, generating, and utilizing expressive visual content, thereby shaping the evolving landscape of media and communication technologies.
Supplementary Material: zip
Submission Number: 1074
Loading