Abstract: While weakly supervised multi-view face reconstruction (MVR) is garnering increased attention, one critical issue still remains open: how to effectively interact and fuse multiple image information to reconstruct high-precision 3D models. In this regard, we propose a novel pipeline called Deep Fusion MVR (DF-MVR) to explore the feature correspondences between multi-view images and reconstruct high-precision 3D faces. Specifically, we present a novel multi-view feature fusion backbone that utilizes face masks to align features from multiple encoders and integrates one multi-layer attention mechanism to enhance feature interaction and fusion, resulting in one unified facial representation. Additionally, we develop one concise face mask mechanism that facilitates multi-view feature fusion and facial reconstruction by identifying common areas and guiding the network’s focus on critical facial features (e.g., eyes, brows, nose, and mouth). Experiments on Pixel-Face and Bosphorus datasets indicate the superiority of the proposed method. Without the 3D annotation, DF-MVR achieves relative 5.2% and 3.0% RMSE improvement over the existing weakly supervised MVRs, respectively, on Pixel-Face and Bosphorus datasets. Our code is available at https://github.com/weiguangzhao/DF_MVR.
External IDs:dblp:conf/icmcs/ZhaoYYZYYDHH25
Loading