3D Priors-Guided Diffusion for Blind Face Restoration

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Blind face restoration aims to restore a sharp face image from a degraded counterpart. Recent methods using GANs as priors have achieved many successful stories in this domain. However, these methods still struggle to balance realism and fidelity when facing complex degradation scenarios. In this paper, we propose a novel framework by embedding 3D facial priors into a denoising diffusion model, enabling the extraction of facial structure and identity information from 3D facial images. Specifically, the downgraded image undergoes initial processing through a pre-trained restoration network to obtain an incompletely restored face image. This image is then fed into the 3D Morphable Model (3DMM) to reconstruct a 3D facial image. During the denoising process, the structural and identity information is extracted from the 3D prior image using a multi-level feature extraction module. Given that the denoising process of the diffusion model primarily involves initial structure refinement followed by texture detail enhancement, we propose a time-aware fusion block (TAFB). This module can provide more effective fusion information for denoising as the time step changes. Extensive experiments demonstrate that our network performs favorably against state-of-the-art algorithms on synthetic and real-world datasets for blind face restoration.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: This work significantly enhances multimedia/multimodal processing by pioneering a methodology that integrates 3D facial priors with a denoising diffusion model to extract structural and identity details from 3D facial images. By leveraging 3D facial information in the restoration process, this framework not only improves the fidelity of the restoration but also enhances the realism of the output. The use of the 3D Morphable Model (3DMM) enables the reconstruction of detailed 3D facial images, enriching the restoration process with intricate features. Moreover, the inclusion of a time-aware fusion block (TAFB) facilitates adaptive fusion information for denoising across different time steps, contributing to more effective and dynamic restoration results. This approach demonstrated through superior performance compared to existing methods, showcases significant advancements in the realm of multimedia/multimodal processing, particularly in blind face restoration scenarios where maintaining structural and identity information is crucial.
Supplementary Material: zip
Submission Number: 5035
Loading