Speech-driven facial animation with spectral gathering and temporal attention

Yujin Chai, Yanlin Weng, Lvdi Wang, Kun Zhou

Published: 01 Jan 2022, Last Modified: 11 Nov 2023Frontiers Comput. Sci. 2022Readers: Everyone

Abstract: In this paper, we present an efficient algorithm that generates lip-synchronized facial animation from a given vocal audio clip. By combining spectral-dimensional bidirectional long short-term memory and temporal attention mechanism, we design a light-weight speech encoder that learns useful and robust vocal features from the input audio without resorting to pre-trained speech recognition modules or large training data. To learn subject-independent facial motion, we use deformation gradients as the internal representation, which allows nuanced local motions to be better synthesized than using vertex offsets. Compared with state-of-the-art automatic-speech-recognition-based methods, our model is much smaller but achieves similar robustness and quality most of the time, and noticeably better results in certain challenging cases.

0 Replies