MagicCartoon: 3D Pose and Shape Estimation for Bipedal Cartoon Characters

Published: 20 Jul 2024, Last Modified: 04 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The 3D model can be estimated by regressing the pose and shape parameters from the image data of the digital model. The reconstruction of 3D cartoon characters poses a challenging task due to diverse visual representations and postural variations. This paper proposes a dual-branch structure named MagicCartoon for 3D bipedal cartoon character estimation, which models pose and shape independently through feature decoupling. Considering the correlation between category difference and shape parameters, a hybrid feature fusion technique is introduced, which integrates the global features of the original image with the corresponding local features expressed by the puzzle image, reducing the abstractness of understanding shape parameter differences. To semantically align image and geometric between feature space, a geometric-guided feedback loop is proposed in an iterative way, so that the pose of modeling results can be expressed consistently with the image. Moreover, a feature consistency loss is designed to augment the training data by incorporating the same character with different postures and the same posture of different characters. It enhances the correlation between the features extracted by the backbone network and the specific task. Experiments conducted on the 3DBiCar dataset demonstrate that MagicCartoon outperforms the state-of-the-art methods.
Relevance To Conference: In this study, we introduce a novel approach for creating 3D models of cartoon characters from images. The core connections to multimedia and multimodal are threefold: Firstly, our model's input incorporates multiple modalities, encompassing both 2D images and RaBit parameterized 3D models. Secondly, we devise a multimodal fusion technique that enables semantic alignment between 2D images and 3D models within a shared feature space, thus facilitating precise pose modeling. Thirdly, our method holds promise for a range of multimedia-reliant applications. Specifically, it alleviates the need for extensive and intricate manipulation of 3D modeling software, enabling users to swiftly create 3D cartoon characters using images for virtual reality (VR) applications video production, sharing, and other multimedia creation ways.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Experience] Multimedia Applications
Submission Number: 591
Loading