Spatial-temporal-spectral transformer for 3D human pose estimation

Published: 01 Jan 2021, Last Modified: 13 Nov 2024HPCC/DSS/SmartCity/DependSys 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Human motion exhibits a high spatial-temporal correlation, and further exploration of the intrinsic correlation of joint motion trajectories is beneficial to improve the performance of 3D pose estimation. Therefore, we propose a novel spatial-temporal-spectral transformer for high-quality 3D human pose estimation in videos, which mainly includes the spatial-temporal transformer at the joint level and the spectral transformer at the joint trajectory level. The former explores the dependencies of the joint level from the skeleton graphic structure and the sequence to obtain a richer feature representation. The latter explores the dependence of joint motion trajectories in the spectral domain. To obtain a more accurate 3D pose estimation of the center frame, a multi-layer stride convolution module is used to realize the estimation from the full frame to the center frame. In addition, since the 2D and 3D pose sequences have the same motion trajectory in the $xy$ plane, we add the consistency constraint to obtain more accurate estimation results. Extensive experiments show that the proposed framework achieves state-of-the-art performance on Human3.6M.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview