Abstract: In this paper, we aim to capture the structure dependency of human joints and improve the localization accuracy of invisible joints. We propose a novel framework: Structure-aware Graph Convolutional Network (SA-GCN) for crowd pose estimation, which can be divided into two components: Sample Pose Net and Refined Pose Net. Firstly, Sample Pose Net includes a multi-scale feature fusion module, which uses multi-scale features to capture small-scale characters and extract the global “rough” pose as much as possible. Secondly, channel and spatial attention are injected into the multi-scale feature fusion module to strengthen the characteristics of small-scale characters. Finally, graph convolution obtained by the disentangled several parallel sub-graph convolution modules in Refined Pose Net. Global and structural advantages of graph convolution are more conducive to predicting difficult points in sample Pose. In addition, SA-GCN obtains lower parameters compared with the popular pose estimation networks. By which, we apply a novel framework SA-GCN to get feature maps for proposal and refinement, respectively. Comprehensive experiments demonstrate that the proposed method achieves superior pose estimation results on two benchmark datasets, CrowdPose and MSCOCO. Moreover, SA-GCN significantly outperforms state-of-the-art performance on CrowdPose and almost always generates plausible human pose predictions.
External IDs:dblp:journals/tjs/WangL23
Loading