Uncertainty-Aware 3D Human Pose Estimation from Monocular VideoOpen Website

2022 (modified: 16 Nov 2022)ACM Multimedia 2022Readers: Everyone
Abstract: Estimating the 3D human pose from the monocular video is challenging mainly due to the depth ambiguity and inaccurate 2D detected keypoints. To quantify the depth uncertainty of 3D human pose via the neural network, we imbue the uncertainty modeling to depth prediction by using evidential deep learning (EDL). Meanwhile, to calibrate the distribution uncertainty of the 2D detection, we explore a probabilistic representation to model the realistic distribution. Specifically, we exploit the EDL to measure the depth prediction uncertainty of the network, and decompose the x-y coordinates into individual distributions to model the deviation uncertainty of the inaccurate 2D keypoints. Then we optimize the depth uncertainty parameters and calibrate the 2D deviations to obtain accurate 3D human poses. Besides, to provide effective latent features for uncertainty learning, we design an encoder which combines graph convolutional network (GCN) and transformer to learn discriminative spatio-temporal representations. Extensive experiments are conducted on three benchmarks (Human3.6M, MPI-INF-3DHP, and HumanEva-I) and the comprehensive results show that our model surpasses the state-of-the-arts by a large margin.
0 Replies

Loading