Abstract: Bottom-up multi-person pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present a Higher-Resolution Network (HigherHRNet) to learn high-resolution feature pyramids. Equipped with mutli-resolution supervision for training and multiresolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize the keypoints, especially for small person, more precisely. The feature pyramid in HigherHRNet consists of the feature map output from HRNet and the upsampled higherresolution one through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves a state-of-the-art result of 70.5% AP on COCO test-dev without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. The code and models are available at this https URL.
0 Replies
Loading