A distribution-aware 2D multi-person pose estimation method with attention mechanisms

Published: 2025, Last Modified: 25 Jan 2026Multim. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: 2D multi-person pose estimation has a wide range of applications in computer vision. Despite significant progress in this area, occlusion, crowded scenes, and the sparse distribution of keypoints remain the main challenges this task faces. In this work, we introduce a distribution-aware single-stage method integrated with attention mechanisms to address these issues. Initially, we propose the Depthwise Deformable Attention Module (DDAM), which provides an adaptive receptive field, enabling the network to focus attention on critical information regions. Subsequently, a hybrid attention module is proposed to enhance the network’s ability to capture global contextual information. To tackle the challenges presented by occlusion and dense crowds in multi-person scenarios, we devise a part-based representation strategy, significantly improving the robustness of our pose estimation efforts. Additionally, benefiting from the success of flow-based generative models, our model can learn the actual distribution of keypoints, enhancing the learning process of keypoint regression. During training, we opted to dynamically balance the weights of multiple related losses rather than manually setting the weights, further boosting the model’s performance. Extensive experiments on various benchmarks demonstrate the effectiveness and efficiency of our proposed method. The experimental results show that our method can compete in accuracy with current state-of-the-art methods while exhibiting efficiency and strong robustness in complex scenes.
Loading