Double Discrete Representation for 3D Human Pose Estimation from Head-mounted Camera

Juheon Hwang, Jiwoo Kang

Published: 01 Jan 2024, Last Modified: 19 Jan 2025ICCE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This work proposes a method to accurately estimate the 3D pose of humans from an egocentric image captured by a head-mounted camera. A third-person-view camera has a field of view, which limits many dynamic situations outside of a motion capture system. To solve the problem, several methods use egocentric views to overcome spatial constraints. However, in the egocentric view of the head-mounted camera, the lower body often appears smaller and is obscured by the upper body, leading to significantly unreliable and inaccurate pose estimation. To address the limitation, we propose an estimation pipeline using Vector Quantized-Variational AutoEncoder (VQ-VAE) to accurately predict the human pose from egocentric images and optimize the predicted pose. Thus, we introduce a novel pipeline for pose estimation and optimization using the codebook by learning egocentric image features and pose features from large human pose datasets with VQ-VAE. The proposed method with the vector quantizer of VQ-VAEs can help improve the generalization performance of the 3D pose estimation from the egocentric view. Through comparative experiments, our method is shown to achieve a significant performance improvement over state-of-the-art methods.