Text-Guided 3D Face Synthesis - From Generation to Editing

Published: 01 Jan 2024, Last Modified: 14 Nov 2024CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Text-guided 3D face synthesis has achieved remarkable results by leveraging text-to-image (T2I) diffusion models. However, most existing works focus solely on the direct gen-eration, ignoring the editing, restricting them from synthe-sizing customized 3D faces through iterative adjustments. In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to miti-gate the loss of geometric details caused by coupling. Be-sides, decoupling enables us to utilize the generated geom-etry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a fine-tuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first em-ploy a pre-trained diffusion model to update facial geometry or texture based on the texts. To enable sequential editing, we introduce a UV domain consistency preservation reg-ularization, preventing unintentional changes to irrelevant facial attributes. Besides, we propose a self-guided consis-tency weight strategy to improve editing efficacy while pre-serving consistency. Through comprehensive experiments, we showcase our method's superiority in face synthesis. Project page: https://faceg2e.github.io/.
Loading