Abstract: We propose a deep convolutional neural network for 3D human pose and camera estimation from monocular images that learns from 2D joint annotations.  The proposed net-work follows the typical architecture, but contains an additional output layer which projects predicted 3D joints onto 2D, and enforces constraints on body part lengths in 3D. We further enforce pose constraints using an independently trained  network  that  learns  a  prior  distribution  over 3D poses.  We  evaluate  our  approach  on  several  benchmark datasets  and  compare  against  state-of-the-art  approaches for 3D human pose estimation, achieving comparable performance.   Additionally,  we  show  that  our  approach  significantly  outperforms  other  methods  in  cases  where  3D ground truth data is unavailable, and that our network exhibits good generalization properties.
0 Replies
Loading