Keywords: Neural Radiance Field, Talking Portrait Synthesis, Wild Scenario
TL;DR: We introduce WildTalker (Talking Portrait Synthesis in the Wild), a novel approach designed to handle both dynamic movements and noisy audio while focusing on generating high-quality talking portraits.
Abstract: We introduce WildTalker, a novel approach for synthesizing high-quality talking portraits that effectively addresses the challenges of real-world environments. Traditional methods often struggle with unpredictable movements and noisy audio. WildTalker overcomes these issues by integrating flow-guided temporal masking, which manages dynamic regions by capturing and de-emphasizing transient areas, and multi-scale spectral subtraction for robust audio denoising. This method allows WildTalker to excel in both controlled and variable scenarios, producing natural and synchronized talking portraits with accurate lip synchronization. Our experiments demonstrate that WildTalker significantly enhances the quality of audio-driven 3D talking portraits in dynamic settings, achieving superior lip synchronization under challenging audio conditions. These results highlight that our method outperforms existing approaches not only in real-world scenarios but also in controlled environments, underscoring its potential for practical applications.
Submission Number: 27
Loading