3D Human Skeleton Estimation from Monocular Single RGB Image based on Multiple Virtual-View Skeleton Generation

Wen-Nung Lie, Veasna Vann, Lee Aing, Jui-Chiu Chiang

Published: 2022, Last Modified: 05 Nov 2024MMSP 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: 3D human skeleton estimation from a single RGB image is one of the challenging problems in computer vision. Motivated by the advantage of the multi-views approach, we proposed a new two-stage approach. The 1 st stage estimates a set of 3D heatmaps, by which 2D image coordinates and relative depth for each joint of a set of 2D skeletons can be derived. It consists of multi-streams, where 1 st stream is to predict 2D+depth skeleton for the real-view and the other streams are to predict skeleton counterparts for virtual-views, thus called Multiple Virtual-View Skeleton Generator (MVSG) network. The 2 nd stage contains: depth-denoising and fusion network, where the outputs of MVSG network are depth-denoised and then fused by concatenation for regression into the final 3D skeleton. Experiments show that our technique has achieved a performance of MPJPE=46.75 mm, which is comparable to the state-of-the-art methods.