Abstract: Reconstructing a 3D face model with high-quality geometry and texture from a single face image is ill-conditioned and challenging. On the one hand, many methods heavily rely on a large amount of training data, which is not easy to obtain. On the other hand, position local features of a face surface can not reflect the global information of an entire face. Due to these challenges, existing methods can hardly reconstruct detailed geometry and realistic textures. To address these issues, we propose a multi-modal feature guided 3D face reconstruction method, named MMFG, which does not require any training data and can generate detailed geometry from a single image. Specifically, we represent the reconstructed 3D face as a signed distance field, and propose to combine the position local feature and multi-modal global features to reconstruct a detailed 3D face. To obtain region-aware information, a Swin Transformer is used as our global feature extractor to extract multi-modal global feature from the rendered multi-view RGB images and depth images. Furthermore, considering the different effects of RGB and depth information on albedo and shading, we use the global features from different modal to guide the recovery of BRDF component respectively during differentiable rendering. Experimental results demonstrate that the proposed method can generate more detailed 3D faces, achieving state-of-the-art results on texture reconstruction and competitive results on shape reconstruction on the NoW dataset.
Loading