Audio-Driven High Definetion and Lip-Synchronized Talking Face Generation Based on Face Reenactment

Xianyu Wang; Yuhan Zhang; Weihua He; Yaoyuan Wang; Minglei Li; Yuchen Wang; Jingyi Zhang; Shunbo Zhou; Ziyang Zhang

Audio-Driven High Definetion and Lip-Synchronized Talking Face Generation Based on Face Reenactment

Xianyu Wang, Yuhan Zhang, Weihua He, Yaoyuan Wang, Minglei Li, Yuchen Wang, Jingyi Zhang, Shunbo Zhou, Ziyang Zhang

Published: 01 Jan 2023, Last Modified: 23 Jul 2025ICASSP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Generating audio-driven photo-realistic talking face has received intensive attention due to its ability to bring more new human-computer interaction experiences. However, previous works struggled to balance high definition, lip synchronization, and low customization costs, which would degrade the user experience. In this paper, a novel audio-driven talking face generation method was proposed, which subtly converts the problem of improving video definition into the problem of face reenactment to produce both lip-synchronized and high- definition face video. The framework is decoupled, meaning that the same trained model can be used on arbitrary characters and audio without further customizing training for specific people, thus significantly reducing costs. Experiment results show that our proposed method achieves the high video definition, and comparable lip synchronization performance with the existing state-of-the-art methods.

Loading