Deep Convolutional Networks for Marker-less Human Pose Estimation from Multiple Views

Matthew Trumble, Andrew Gilbert, Adrian Hilton, John P. Collomosse

2016 (modified: 26 Sept 2022)CVMP 2016Readers: Everyone

Abstract: We propose a human performance capture system employing convolutional neural networks (CNN) to estimate human pose from a volumetric representation of a performer derived from multiple view-point video (MVV).We compare direct CNN pose regression to the performance of an affine invariant pose descriptor learned by a CNN through a classification task. A non-linear manifold embedding is learned between the descriptor and articulated pose spaces, enabling regression of pose from the source MVV. The results are evaluated against ground truth pose data captured using a Vicon marker-based system and demonstrate good generalisation over a range of human poses, providing a system that requires no special suit to be worn by the performer.

0 Replies