Gesture Generation with Diffusion Models Aided by Speech Activity InformationDownload PDF

Published: 04 Sept 2023, Last Modified: 09 Jul 2024GENEA Challenge 2023 WorkshopproceedingReaders: Everyone
Keywords: Gesture generation, co-speech gestures, diffusion models
Abstract: This paper describes a gesture generation model based on state-of-the-art diffusion models. Novel adaptations were introduced to improve motion appropriateness relative to speech and human-likeness. Specifically, the main focus was to enhance gesture responsiveness to speech audio. In particular, we explored using a pre-trained Voice Activity Detector (VAD) to obtain more meaningful audio representations. The proposed model was submitted to the GENEA Challenge 2023. Perceptual experiments compared our model, labeled SH, with other submissions to the challenge. The results indicated that our model achieved competitive levels of human-likeness. While appropriateness to the agent's speech score was lower than most entries, there were no statistically significant differences from most models at the confidence level.
3 Replies

Loading