A Deep Learning-Based Model for Head and Eye Motion Generation in Three-party Conversations

Aobo Jin, Qixin Deng, Yuting Zhang, Zhigang Deng

Published: 01 Jan 2019, Last Modified: 15 May 2023Proc. ACM Comput. Graph. Interact. Tech. 2019Readers: Everyone

Abstract: In this paper we propose a novel deep learning based approach to generate realistic three-party head and eye motions based on novel acoustic speech input together with speaker marking (i.e., speaking time for each interlocutor). Specifically, we first acquire a high quality, three-party conversational motion dataset. Then, based on the acquired dataset, we train a deep learning based framework to automatically predict the dynamic directions of both the eyes and heads of all the interlocutors based on speech signal input. Via the combination of existing lip-sync and speech-driven hand/body gesture generation algorithms, we can generate realistic three-party conversational animations. Through many experiments and comparative user studies, we demonstrate that our approach can generate realistic three-party head-and-eye motions based on novel speech recorded on new subjects with different genders and ethnicities.

0 Replies