Abstract: We introduce a novel method for recovering a consistent and dense 3D geometry and appearance of a dressed person from a monocular video. Existing methods mainly focus on tight clothing and recover human geometry as a single representation. Our key idea is to regress the holistic 3D shape and appearance as a canonical displacement and albedo maps in the UV space, while fitting the visual observations across frames. Specifically, we represent the naked body shape by a UV-space SMPL model, and represent the other geometric details, including the clothing, as a shape displacement UV map. We obtain the temporally coherent overall shape by leveraging a differential mask loss and a pose regularization. The surface details in UV space are jointly learned in the course of non-rigid deformation with the differentiable neural rendering. Meanwhile, the skinning deformation in the garment region is updated periodically to adjust its residual non-rigid motion in each frame. We additionally enforce
Loading