The IVI Lab entry to the GENEA Challenge 2022 -- A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism

Che-Jui Chang; Sen Zhang; Mubbasir Kapadia

The IVI Lab entry to the GENEA Challenge 2022 -- A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism

Che-Jui Chang, Sen Zhang, Mubbasir Kapadia

Published: 25 Oct 2022, Last Modified: 05 May 2023GENEA Challenge & Workshop 2022 MainproceedingReaders: Everyone

Track: Challenge paper

Team Name: IVI Lab

Keywords: co-speech gesture generation, sequence-to-sequence modeling, locality constraint attention

Abstract: This paper describes the IVI Lab entry to the GENEA Challenge 2022. We formulate the gesture generation problem as a sequence-to-sequence conversion task with text, audio, and speaker identity as inputs and the body motion as the output. We use the Tacotron2 architecture as our backbone with the locality-constraint attention mechanism that guides the decoder to learn the dependencies from the neighboring latent features. The collective evaluation released by GENEA Challenge 2022 indicates that our two entries (FSH and USK) for the full body and upper body tracks statistically outperform the audio-driven and text-driven baselines on both two subjective metrics. Remarkably, our full-body entry receives the highest speech appropriateness (60.5% matched) among all submitted entries. We also conduct an objective evaluation to compare our motion acceleration and jerk with two autoregressive baselines. The result indicates that the motion distribution of our generated gestures is much closer to the distribution of natural gestures.

4 Replies

Loading