Abstract: Humans have an impressive ability to reliably perceive pose with semantic descriptions (e.g. both arm up or left leg bent). To leverage the transitive structure characteristics for human pose estimation, we explore the part descriptor that qualitatively describe the structure consistency on various appearance. Meantime, we utilize the fixed bone constraint to fully exploit structure knowledge. In this paper, we propose an effective network of jointly modeling part descriptor and bone heatmap as structure information to dynamically learn from compositional features. Specially, this part descriptor distill the structure consistency as external guidance via feature injection, and the introduced bone detection as internal guidance through multi-level feature fusion. Hence the proposed method enables the network effectively incorporating higher level structure into lower level keypoint detection models, which leads to extract more robust features for the optimal pose estimation. The effectiveness of proposed method has been evaluated on LSP, MPII, LIP, COCO and CrowdPose dataset. The experimental results demonstrate that it can outperform most of the state-of-the-art methods on the widely used benchmarks with less complexities.
0 Replies
Loading