Abstract: Egocentric 3D human pose estimation with a single fish-
eye camera has drawn a significant amount of attention re-
cently. However, existing methods struggle with pose esti-
mation from in-the-wild images, because they can only be
trained on synthetic data due to the unavailability of large-
scale in-the-wild egocentric datasets. Furthermore, these
methods easily fail when the body parts are occluded by
or interacting with the surrounding scene. To address the
shortage of in-the-wild data, we collect a large-scale in-the-
wild egocentric dataset called Egocentric Poses in the Wild
(EgoPW). This dataset is captured by a head-mounted fish-
eye camera and an auxiliary external camera, which pro-
vides an additional observation of the human body from a
third-person perspective during training. We present a new
egocentric pose estimation method, which can be trained
on the new dataset with weak external supervision. Specifi-
cally, we first generate pseudo labels for the EgoPW dataset
with a spatio-temporal optimization method by incorporat-
ing the external-view supervision. The pseudo labels are
then used to train an egocentric pose estimation network.
To facilitate the network training, we propose a novel learn-
ing strategy to supervise the egocentric features with the
high-quality features extracted by a pretrained external-
view pose estimation model. The experiments show that
our method predicts accurate 3D poses from a single in-the-
wild egocentric image and outperforms the state-of-the-art
methods both quantitatively and qualitatively.
0 Replies
Loading