WS-GCN: Integrating GCN with Weak Supervision for Enhanced 3D Human Pose Estimation

Zhenxiang Jiang; Yingyu Chen

WS-GCN: Integrating GCN with Weak Supervision for Enhanced 3D Human Pose Estimation

Zhenxiang Jiang, Yingyu Chen

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ICCAI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Precisely estimating the 3D human pose is an important and challenging goal in computer vision domain. This topic has improved significantly with the advent of deep learning paradigms and graph convolutional networks (GCNs), especially in the context of contextualizing and analyzing human motion from visual input. This study offers a novel strategy that combines the structural stability of GCNs with the robustness of weakly supervised learning techniques to address the fundamental challenges of 3D human pose estimation, which is closely related to the scarcity of large 3D in-the-wild datasets and the difficulties involved in modeling 3D spatial data. The novel WS-GCN model, a weakly-supervised semantic graph convolutional neural network, is the central component of our proposal. By building a semantic graph, this model is able to represent the complex semantic and spatial relationships between anatomical joints. With the integration of a non-local layer, WS-GCN greatly improves the accuracy of projecting 2D coordinates into 3D space. The effectiveness of our methodology is demonstrated by empirical evaluations, especially when combined with a bone length penalty and a fully-supervised training warm-up stage. In combination, these improvements strengthen the model’s performance when it comes to weakly supervised domains. The model demonstrates quantitative performance under full supervision, with a Mean Per Joint Position Error (MPJPE) of 41.95 mm and a Procrustes-aligned MPJPE (P-MPJPE) of 33.40 mm. Under the weakly supervised conditions, the model attains the MPJPE of 49.23 mm and the P-MPJPE of 39.88 mm. Significantly, this model scored third in the overall domain of weakly supervised 3D human pose estimation on the Human3.6M dataset and attained SOTA in the single view, single frame weakly-supervised 3D human pose estimation on the dataset. This work not only represents an important step forward in the field of 3D human pose estimation, but it also lays out a foundation for further research and possible uses in related fields like virtual reality and interactive computing. The implementation is available at https://github.com/RoyMikeJiang/WSGCN.

Loading