Probabilistic Human-like Gesture Synthesis from Speech using GRU-based WGAN

Bowen Wu; chaoran liu; Carlos Ishi; Hiroshi Ishiguro

Probabilistic Human-like Gesture Synthesis from Speech using GRU-based WGAN

Bowen Wu, chaoran liu, Carlos Ishi, Hiroshi Ishiguro

Published: 19 Jul 2021, Last Modified: 05 May 2023GENEA Workshop 2021 OralReaders: Everyone

Abstract: Gestures are crucial for increasing the human-likeness of agents and robots to achieve smoother interactions with humans. The realization of an effective system to model human gestures, which are matched with the speech utterances, is necessary to be embedded in these agents. In this work, we propose a GRU-based autoregressive generation model for gesture generation, which is trained with a CNN-based discriminator in an adversarial manner using a WGAN-based learning algorithm. The model is trained to output the rotation angles of the joints in the upper body, and implemented to animate a CG avatar. The motions synthesized by the proposed system are evaluated via an objective measure and a subjective experiment, showing that the proposed model outperforms a baseline model which is trained by a state-of-the-art GAN-based algorithm, using the same dataset. This result reveals that it is essential to develop a stable and robust learning algorithm for training gesture generation models. Our code can be found in https://github.com/wubowen416/gesture-generation.

Supplementary Material: zip

4 Replies

Loading