Abstract: Recently, we have witnessed a boom in applications for 3D talking face generation. However, most existing 3D face
generation methods can only generate 3D faces with a static head pose, which is inconsistent with how humans perceive faces. Only
a few papers focus on head pose generation, but even these ignore the attribute of personality. In this paper, we propose a unified
audio-driven approach to endow 3D talking faces with personalized pose dynamics. To achieve this goal, we establish an original
person-specific dataset, providing corresponding head poses and face shapes for each video. Our framework is composed of two
separate modules: PoseGAN and PGFace. Given an input audio, PoseGAN first produces a head pose sequence for the 3D head,
and then, PGFace utilizes the audio and pose information to generate natural face models. With the combination of these two parts,
a 3D talking head with dynamic head movement can be constructed. Experimental evidence indicates that our method can generate
person-specific head pose sequences that are in sync with the input audio and that best match with the human experience of talking
heads.
0 Replies
Loading