Speech-Driven Emotional 3d Talking Face Animation Using Emotional Embeddings

Seongmin Lee, Jeonghaeng Lee, Hyewon Song, Sanghoon Lee

Published: 01 Jan 2024, Last Modified: 17 Apr 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Existing emotional talking 3D facial animation primarily focus on animating emotional faces using a specific emotion condition. However, in real-world situations, no one consistently speaks with just one emotion. Thus, previous emotion-based approaches have very limited applicability in real-world applications. To address this issue, we propose SDETalk, a novel learning framework that animates the emotional talking faces by leveraging the emotional source from a speech. Unlike previous studies, which use static one-hot emotion conditions, the proposed network regresses complex emotional states from speech. It enables the network to animate natural facial animation from an emotional speech without using a specific emotional condition. Furthermore, we design the proposed method to produce head motions because head motion is an important factor to enhance the naturalness of talking face animation. By doing this, our approach simultaneously achieves accurate lip motion, natural expressions, and rhythmical head motions from emotional speech. Through extensive experiments in both qualitative and quantitative manners, it is demonstrated that our method outperforms other state-of-the-art methods by animating realistic and expressive 3D faces.