Abstract: This research aims to create a data-driven end-to-end model for multimodal forecasting body pose and gestures of virtual avatars. A novel aspect of this research is to coalesce both narrative and dialogue for pose forecasting. In a narrative, language is used in a third person view to describe the avatar actions. In dialogue both first and second person views need to be integrated to accurately forecast avatar pose. Gestures and poses of a speaker are linked to other modalities: language and acoustics. We use these correlations to better predict the avatar’s pose.
0 Replies
Loading