Keywords: 3D Human pose and shape, camera calibration, SMPLify fitting
TL;DR: Trained a camera calibration model using real-world images of humans and leverage the estimated camera intrinsics to enhance the accuracy of 3D human pose and shape estimation.
Abstract: In this work, we address the challenge of accurate 3D human pose and shape (HPS) estimation from monocular images. The key to accuracy and robustness lies in high-quality training data. Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations, assuming a simplified camera with default intrinsics. We make two contributions that improve pGT accuracy.
First, to estimate camera intrinsics, we develop a field-of-view prediction model HumanFoV trained on a dataset of images containing people. We use the estimated intrinsics to enhance the 4D-Humans dataset by incorporating a full perspective camera model during SMPLify fitting.
Second, 2D joints provide limited constraints on 3D body shape, often resulting in average-looking bodies. To address this, we use the BEDLAM dataset to train a dense surface keypoint detector. We apply this detector to the 4D-Humans dataset and modify SMPLify to fit the detected keypoints, resulting in significantly more realistic body shapes.
Finally, we enhance the HMR2.0 architecture to include the estimated camera parameters. We iterate the process of model training and SMPLify fitting initialized with the previously trained model. This leads to more accurate pGT and significant performance gains. Our method, CameraHMR, achieves state-of-the-art 3D accuracy on HPS benchmarks. Code will be available for research purposes.
Supplementary Material: zip
Submission Number: 337
Loading