Keywords: Human Pose Estimation; Confidence; Learning to Rank
Abstract: While 2D human pose estimation (HPE) has achieved strong advances in keypoint localization, the ranking of pose confidence scores has received little attention. These scores are central to evaluation protocols such as mean Average Precision (mAP) and to applications like pose selection, where predictions are ordered or filtered based on their scores. Yet, the quality of these rankings is often suboptimal, limiting overall performance. In this paper, we ask whether explicitly optimizing the ranking of confidence scores, without altering keypoint coordinates, can improve HPE. To this end, we formulate confidence ranking as a pairwise ordering problem, which, to our knowledge, has not been directly explored in HPE. We further propose a rank loss that upper bounds the negative expected likelihood of correct orderings and guarantees that reducing the loss leads to higher-quality rankings. To validate this formulation, we present Ranked Confidence Net (RCNet), a lightweight module with only 0.07M parameters that refines confidence rankings post hoc while leaving keypoints unchanged. RCNet serves as both a conceptual demonstration of the value of ranking and a practical tool with negligible computational cost. Experiments on COCO show consistent improvements across strong HPE baselines, with an average gain of 0.7 mAP (ranging from 0.3 to 1.8), and consistent gains on CrowdPose. These results establish confidence ranking, independent of coordinate refinement, as an effective and previously overlooked direction for advancing human pose estimation.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1327
Loading