Skip-attention encoder–decoder framework for human motion prediction

Ruipeng Zhang, Xiangbo Shu, Rui Yan, Jiachao Zhang, Yan Song

Published: 01 Apr 2022, Last Modified: 31 Mar 2026Multimedia SystemsEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Human motion prediction aims to automatically predict the future motion sequence based on an observed human motion sequence. In this paper, we propose a novel skip-attention encoder–decoder (SAED) framework to model human motion dependences in spatiotemporal space, by utilizing the encoder and decoder to encode the observed motions, and decode the predicted motions, respectively. Overall, this framework has two main insights. First, we design a new self-renewing ConvGRU as the unit of encoder and decoder to effectively capture temporal and spatial skeleton-motion dependencies. Second, we present a new skip-attention mechanism (SAM) to aggregate the motion information of all layers based on their importance. In experiments, quantitative and qualitative results on the Human3.6M and CMU motion capture datasets show the effectiveness of the proposed SAED compared with the related methods.

External IDs:doi:10.1007/s00530-021-00807-4