A Low Bit-rate Web-enabled Synthetic Head with Speech-driven Facial Animation

I-Chen Lin, Chien-Feng Huang, Jia-Chi Wu, Ming Ouhyoung

Published: 2000, Last Modified: 20 Jul 2025Computer Animation and Simulation 2000EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, an approach that animates facial expressions through speech analysis is presented. An individualized 3D head model is first generated by modifying a generic head model, where a set of MPEG-4 Facial Definition Parameters (FDPs) has been pre-defined. To animate realistic facial expressions of the 3D head model, key frames of facial expressions are calculated from motion-captured data. A speech analysis module is employed to obtain mouth shapes that are converted to MPEG-4 Facial Animation Parameters (FAPs) to drive the 3D head model with corresponding facial expressions. The approach has been implemented as a real-time speech-driven facial animation system. When applied to Internet, our talking head system can be a vivid web-site presenter, and only requires 14 Kbps with an additional header image (about 30Kbytes in CIF format, JPEG compressed). The system can synthesize facial animation more than 30 frames/sec on a Pentium III 500 MHz PC. Currently, the data streaming are implemented under Microsoft ASF format, Internet Explorer, and Netscape’s Navigator.