Keywords: 3D pose estimation, Part-based Adaptive GNN, Frameset-based Skipped Transformer, efficient and robust
TL;DR: We propose a compact Graph and Skipped Transformer architecture to realise efficient and robust 2D-to-3D Human Pose Estimation.
Abstract: Recent works in 2D-to-3D pose uplifting for monocular 3D Human Pose Estimation (HPE) have shown significant progress. However, two key challenges persist in real-world applications: vulnerability to joint noise and high computational costs. These issues arise from the dense joint-frame connections and iterative correlations typically employed by mainstream GNN-based and Transformer-based methods. To address these challenges, we propose a novel approach that leverages human physical structure and long-range dynamics to learn spatial part- and temporal frameset-based representations. This method is inherently robust to missing or erroneous joints while also reducing model parameters. Specifically, in the Spatial Encoding stage, coarse-grained body parts are used to construct structural correlations with a fully adaptive graph topology. This spatial correlation representation is integrated with muti-granularity pose attributes to generate a comprehensive pose representation for each frame. In Temporal Encoding and Decoding stages, Skipped Self-Attention is performed in framesets to establish long-term temporal dependencies from multiple perspectives of movement. On this basis, a compact Graph and Skipped Transformer (G-SFormer) is proposed, which realises efficient and robust 3D HEP in both experimental and practical scenarios. Extensive experiments on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks demonstrate that G-SFormer series models can compete and outperform the state-of-the-arts but takes only a fraction of parameters and around 1\% computational cost. It also exhibits outstanding robustness to inaccurately detected 2D poses. The source code will be available at https://sites.google.com/view/g-sformer.
Supplementary Material: pdf
Primary Area: learning on time series and dynamical systems
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3002
Loading