JPA: A Joint-Part Attention for Mitigating Overfocusing on 3D Human Pose Estimation

Published: 01 Jan 2024, Last Modified: 10 Jan 2025PRCV (6) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, transformer-based solutions have exhibited remarkable success in 3D human pose estimation (3D-HPE) by computing pairwise relations between joints. However, we observed that the conventional self-attention mechanism in 3D-HPE tends to overly focus on a tiny fraction of joints. Moreover, these overfocused joints often lack relevance to the performed actions, resulting in models that struggle to generalize across poses. In this paper, we address this issue by incorporating prior information on the human body structure through a plug-and-play Joint-Part Attention (JPA) module. Firstly, we design a Part-aware Weighted Aggregation (PWA) module to merge different joints into distinct parts. Secondly, we introduce a Joint-Part Cross-scale Attention (JPCA) module to encourage the model to attend to more joints. This is achieved by configuring joint tokens to query part tokens across two scales. In our experiments, we apply JPA to various transformer-based methods, demonstrating its superiority on Human3.6M, MPI-INF-3DHP, and HumanEva datasets. We will release our code.
Loading