Abstract: As an important and challenging task in computer vision, human motion prediction aims to predict the future human motion sequence from a given historical sequence. Though the existing works can perform well with a well-designed network, they fail to exploit the semantic information within the input sequences. Inspired by the observation that the human motion sequence strongly correlates with the semantic class, we propose a class-guided network to predict future human poses. Specifically, the semantic class of the historical motion sequence is integrated as an elaborate class-guided loss function, which guides the network to predict the semantic-specific poses. Furthermore, we devise two extra spatial-temporal supervision signals to improve the stability and smoothness of the predicted motion sequence: the spatial multi-scale loss can promote the stability by minimizing the difference between the predictions and the groundtruth at multiple scales; and the multi-temporal loss can enhance the smoothness by narrowing the kinetics difference of human motion sequences. The experimental results on two benchmark datasets (i.e., Human3.6M and CMU Mocap) demonstrate that the proposed supervisions can effectively improve the prediction accuracy, and our method leads to a new state-of-the-art performance. Our code is available at https://github.com/cobblestones/CGHMP .
0 Replies
Loading