Co-Speech Gesture Generation via Audio and Text Feature EngineeringDownload PDF

Published: 04 Sept 2023, Last Modified: 30 Oct 2023GENEA Challenge 2023 WorkshopproceedingReaders: Everyone
Keywords: Human-computer interaction (HCI), Gesture generation, Deep learning, Multimodal Learning
Abstract: In recent years, the field of human-computer interaction (HCI) research has seen increasing efforts to model social intelligence and behavior based on artificial intelligence. For human-agent communication to evolve in a ”human-way”, non-verbal features can be used as important factors. We conducted our research as part of the GENEA Challenge 2023, where the task is to generate human gestures using these non-verbal elements. We applied two main approaches to generating natural gestures. First, we modified the provided baseline model to apply RoBERTa-based speech transcription embedding, and second, we designed a gesture generation model by adding a zero-crossing rate and rhythmical features to the input features. The gestures generated by this method were evaluated as unnatural in terms of human-like and conformity. However, through this, we will study the SOTA model structure of gesture generation in the future and apply various preprocessing methods to the input data to generate natural gestures.
3 Replies

Loading