Towards Labeling-free Fine-grained Animal Pose Estimation

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this paper, we are interested in identifying denser and finer animals joints. The lack of standardized joint definitions across various APE datasets, e.g., AnimalPose with 20 joints, AP-10k with 17 joints, and TigDog with 19 joints, presents a significant challenge yet offers an opportunity to fully utilize annotation data. This paper challenges this new non-standardized annotation problem, aiming to learn fine-grained (e.g., 24 or more joints) pose estimators in datasets that lack complete annotations. To combat the unannotated joints, we propose FreeNet, comprising a base network and an adaptation network connected through a circuit feedback learning paradigm. FreeNet enhances the adaptation network's tolerance to unannotated joints via body part-aware learning, optimizing the sampling frequency of joints based on joint detection difficulty, and improves the base network's predictions for unannotated joints using feedback learning. This leverages the cognitive differences of the adaptation network between non-standardized labeled and large-scale unlabeled data. Experimental results on three non-standard datasets demonstrate the effectiveness of our method for fine-grained APE.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Media Interpretation, [Experience] Multimedia Applications
Relevance To Conference: Our research aligns well with the multimedia theme, especially in the context of "Understanding Multimedia Content" and "Multimedia Applications." - Understanding Multimedia Content involves processing and interpreting various data types, including images and videos. Our work on animal pose estimation falls within this scope, as it involves interpreting visual data (images of animals) to locate body joints. This process enables a more nuanced understanding and interpretation of the content presented in the images. - Multimedia Applications: The application of our work extends to various multimedia domains. Animal pose estimation (APE) aims to localize the joint positions on animal bodies. It has important implications for a range of applications, including behavior understanding, wildlife conservation, animal individual identification, and the generation of animal-related multimedia content. For example, we can combine accurate pose estimation with audio analysis to interpret animal behavior. This is a cheaper and quicker alternative to clinical examinations and vital sign monitoring. Also, it can contribute to developing more immersive and interactive multimedia content involving animals, enhancing user engagement. - Additionally, we recognize that this conference has published many papers on similar research topics, such as human pose estimation, hand pose estimation, and animal face alignment.
Supplementary Material: zip
Submission Number: 2405
Loading