Accurate-PGNet: Learning to Assemble Perceptual Body Parts for Accurate Human Skeleton Establishment

Renjie Zhang, Di Lin, Xin Wang, George Baciu, C. L. Philip Chen, Ping Li

Published: 01 Jan 2025, Last Modified: 16 May 2025IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The human skeleton establishment aims to provide accurate localization information of the human body from RGB images and establish a complete human skeleton for many applications, such as action recognition, video surveillance, and human-computer interaction. Considering the inherent human body structure, many recent methods group the relevant body parts and utilize the deep convolutional network to learn the visual context from the part groups. However, the grouping approaches used in these methods heavily rely on prior knowledge of the human body shape but lose important relationships between parts. In this paper, we introduce the Accurate Part Grouping Network (Accurate-PGNet), a novel network for hierarchically grouping body parts in a data-driven manner. In contrast to the previous methods, we use neural architecture search (NAS) to optimize the architecture of Accurate-PGNet and properly group the body parts. The part grouping respects the diverse visual patterns of parts, producing groups containing different body parts. From each group, we learn the visual feature map. It helps to capture the correlation between parts and predict their locations. The feature maps of the part groups are merged hierarchically to capture the higher-order context of parts in larger groups. We extensively evaluated our method on the challenging benchmarks, demonstrating that Accurate-PGNet effectively helps to achieve state-of-the-art results.