Abstract: Talking faceTalking faces generation aims at synthesizing coherent and realistic face sequences given an input speech. The task enjoys a wide spectrum of downstream applications, such as teleconferencing, movie dubbing, and virtual assistant. The emergence of deep learningDeep learning and cross-modality research has led to many interesting works that address talking faceTalking faces Talking faces Audio-to-video generation. Despite great research efforts in talking faceTalking faces generation, the problem remains challenging due to the need for fine-grained control of face components and the generalization to arbitrary sentences. In this chapter, we first discuss the definition and underlying challenges of the problem. Then, we present an overview of recent progress in talking faceTalking faces generation. In addition, we introduce some widely used datasets and performance metrics. Finally, we discuss open questions, potential future directions, and ethical considerations in this task.
0 Replies
Loading