Abstract: Human action recognition is important for many applications such as surveillance monitoring, safety, and healthcare.
As 3D body skeletons can accurately characterize body actions and are robust to camera views, we propose
a 3D skeleton-based human action method. Different from the existing skeleton-based methods that use only
geometric features for action recognition, we propose a physics-augmented encoder and decoder model that produces
physically plausible geometric features for human action recognition. Specifically, given the input skeleton
sequence, the encoder performs a spatiotemporal graph convolution to produce spatiotemporal features for both
predicting human actions and estimating the generalized positions and forces of body joints. The decoder, implemented
as an ODE solver, takes the joint forces and solves the Euler-Lagrangian equation to reconstruct the
skeletons in the next frame. By training the model to simultaneously minimize the action classification and the
3D skeleton reconstruction errors, the encoder is ensured to produce features that are consistent with both body
skeletons and the underlying body dynamics as well as being discriminative. The physics-augmented spatiotemporal
features are used for human action classification. We evaluate the proposed method on NTU-RGB+D, a
large-scale dataset for skeleton-based action recognition. Compared with existing methods, our method achieves
higher accuracy and better generalization ability.
0 Replies
Loading