Abstract: This work introduces a novel Convolutional Network architecture (ConvNet) for the task of human pose estimation, that is the localization of body joints in a single static image. The proposed coarse to fine architecture addresses shortcomings of the baseline architecture that stem from the fact that large inaccuracies of its coarse ConvNet cannot becorrected by the refinement ConvNet that refines the estimation with in small windows of the coarse prediction. This is achieved by (a) changes in architectural parameters that both increase the accuracy of the coarse model and make the refinement model more capable of correcting the errors of the coarse model, (b) the introduction of a Markov Random Field (MRF)-based spatial model network between the coarse and the refinement model that introduces geometric constraints and (c) a training scheme that adapts the data augmentation and the learning rate according to the difficulty of the data examples. The proposed architecture is trained in an end-to-end fashion. Experimental results show that the proposed method improves the baseline model and provides state of the art results on the FashionPose[8] and MPII benchmarks [1].
0 Replies
Loading