Speech Driven Facial Animation

Tzong-Jer Yang, I-Chen Lin, Cheng-Sheng Hung, Chien-Feng Huang, Ming Ouhyoung

Published: 1999, Last Modified: 20 Jul 2025Computer Animation and Simulation 1999EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we present an approach that animates facial expressions through speech analysis. An individualized 3D head model is first generated by modifying a generic head model, where a set of MPEG-4 Facial Definition Parameters (FDPs) has been pre-defined. To animate facial expressions of the 3D head model, a real-time speech analysis module is employed to obtain mouth shapes that are converted to MPEG-4 Facial Animation Parameters (FAPs) to drive the 3D head model with corresponding facial expressions. The approach has been implemented as a real-time speech-driven facial animation system. On a PC with a single Pentinum-III 500MHz CPU, the system performance is around 15–24 frames/sec with image size 120×150. The input is live audio, and initial delay is within 4 seconds. An ongoing model-based visual communication system that integrates a 3D head motion estimation technique with this system is also described.