High-Fidelity Talking Portrait Synthesis with Personalized 3D Generative Prior

Published: 07 Aug 2025, Last Modified: 22 Aug 2025Gen4AVC PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: audio-driven talking portrait, 3d head synthesis, talking face, 3d talking head, multimodal 3d generation
TL;DR: We introduce Talk3D, a novel framework that reconstructs plausible facial geometries by adopting pre-trained 3D-aware generative priors through generator personalization.
Abstract: Recent audio-driven talking head synthesis methods optimize neural radiance fields (NeRF) on monocular videos but struggle with incomplete face geometry reconstruction due to limited 3D information. We introduce Talk3D, a novel framework that reconstructs plausible facial geometries by adopting pre-trained 3D-aware generative priors through generator personalization. Our audio-guided attention U-Net architecture predicts dynamic face variations in NeRF space driven by input audio, with conditioning tokens that disentangle scene variations unrelated to audio. Talk3D excels at generating realistic frames under extreme head poses, demonstrating superior performance compared to existing methods in extensive quantitative and qualitative evaluations.
Supplementary Material: zip
Submission Number: 6
Loading