Keywords: Robust to Occlusion, Analysis-by-synthesis, Human Pose Estimation, Generative Models, Representation with Human-relevant Insight
TL;DR: A robust human pose optimization with volumetric neural human via analysis-by-synthesis
Abstract: Regression-based approaches dominate the field of 3D human pose estimation, because of their quick fitting to distribution in a data-driven way. However, in this work we find the regression-based methods lack robustness under out-of-distribution, i.e. partial occlusion, due to its heavy dependence on the quality of prediction of 2D keypoints which are sensitive to partial occlusions. Inspired by the neural mesh models for object pose estimation, i.e. meshes combined with neural features, we introduce a human pose optimization approach via render-and-compare neural features. On the other hand, the volume rendering technical demonstrate better representation with accurate gradients for reasoning occlusions. In this work, we develop a volumetric human representation and a robust inference pipeline via volume rendering with gradient-based optimizations, which synthesize neural features during inference while gradually updating the human pose via maximizing the feature similarities. Experiments on 3DPW show ours better robustness to partial occlusion with competitive performance on unoccluded cases.