Dual Diffusion Model for One-Shot High-Fidelity Talking Head Generation

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Diffusion model, Talking head generation, Video Synthesis
Abstract: One-shot audio-driven talking head generation is a significant task with applications in the movie industry and virtual avatars. However, existing methods have limitations in accurately capturing dynamic nuances within the mapping of audio-to-lip motion. Furthermore, GAN based models for converting lip motion into pixel-level video often exhibit unstable training. To overcome these limitations, recent approaches based on diffusion models are proposed but still face issues such as time consumption and maintaining temporal consistency due to stochasticity. To circumvent these challenges, we introduce the following two modules: 1) AToM-Net, tasked with the generation of audio-to-motion pairs, and 2) MC-VDM, designed to produce high-quality image sequences corresponding to generated motion sequences reflecting a single identity image. Both modules are grounded in the framework of diffusion models. AToM-Net, with its inherent stochasticity akin to diffusion models, excels in capturing the subtleties of lip motion dynamics, avoiding the problem of mode collapse. MC-VDM solves the problems of the existing diffusion-based talking head by utilizing the efficient tri-plane based module. Our experiments conducted on the standard benchmark indicate that our model achieves performance that surpasses that of existing models.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2624
Loading