Multi-person pose estimation based on graph grouping optimization

Qingzhi Zeng, Yingsong Hu, Dan Li, Dongya Sun

Published: 2023, Last Modified: 02 Mar 2026Multim. Tools Appl. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multi-person pose estimation has been an increasingly popular topic with the advancements of all kinds of computer vision and human-machine interaction tasks. This study field could further enhance the understanding of human poses and activities. The current mainstream multi-person pose estimation methods are generally divided into two categories: top-down and bottom-up methods. Although top-down methods are capable of achieving better performance by simplifying the problem to single-person pose estimation, while this strategy somewhat greatly increases the time complexity as a trade-off for better accuracy. The bottom-up methods could directly locate all the keypoints in the image, which can be potentially more effective and can be made real-time. However, most of the current bottom-up methods have separated the detection and grouping of keypoints into two independent steps. This greatly hindered the overall performance and computation efficiency of the algorithms. To address this issue, our study proposes an end-to-end bottom-up framework for multi-person pose estimation. Using the HRNet as the backbone structure, we add a deconvolution module to acquire high-resolution feature maps in the keypoints proposal stage. The graph neural network is leveraged in the grouping stage, which is integrated to the backbone so that the whole framework can be trained in an end-to-end manner. Using the keypoint candidates as nodes, two discriminators are exploited to supervise the grouping process. Lastly, a graph-based pose optimization algorithm is explored to refine the results. Experiments on the COCO and CrowdPose datasets show that our method achieves better accuracy and greatly reduce the computation time as well.

External IDs:dblp:journals/mta/ZengHLS23