From 2D Images to 3D Model: Weakly Supervised Multi-View Face Reconstruction with Deep Fusion

Published: 2025, Last Modified: 09 Nov 2025ICME 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: While weakly supervised multi-view face reconstruction (MVR) is garnering increased attention, one critical issue still remains open: how to effectively interact and fuse multiple image information to reconstruct high-precision 3D models. In this regard, we propose a novel pipeline called Deep Fusion MVR (DF-MVR) to explore the feature correspondences between multi-view images and reconstruct high-precision 3D faces. Specifically, we present a novel multi-view feature fusion backbone that utilizes face masks to align features from multiple encoders and integrates one multi-layer attention mechanism to enhance feature interaction and fusion, resulting in one unified facial representation. Additionally, we develop one concise face mask mechanism that facilitates multi-view feature fusion and facial reconstruction by identifying common areas and guiding the network’s focus on critical facial features (e.g., eyes, brows, nose, and mouth). Experiments on Pixel-Face and Bosphorus datasets indicate the superiority of the proposed method. Without the 3D annotation, DF-MVR achieves relative 5.2% and 3.0% RMSE improvement over the existing weakly supervised MVRs, respectively, on Pixel-Face and Bosphorus datasets. Our code is available at https://github.com/weiguangzhao/DF_MVR.
Loading