View-Invariant Human Action Recognition Via View Transformation Network (VTN)

Lingling Gao, Yanli Ji, Kumie Gedamu, Xiaofeng Zhu, Xing Xu, Heng Tao Shen

2022 (modified: 24 Apr 2023)IEEE Trans. Multim. 2022Readers: Everyone

Abstract: Since the human body is non-rigid, actions captured in different views always involve action occlusion and information loss. Recently, view-variation-related human action recognition is still a challenging problem. To address the problem, we propose a View Transformation Network (VTN) that realizes the view normalization by transforming arbitrary-view action samples to a base view to seek for a view-invariant representation. an attention learning module is designed to learn a co-attention for action samples of different views, that contributes to output a similar feature representation to erase the view diversity in different views. Extensive and fair evaluations are performed on the UESTC varying-view RGB-D dataset, the NTU RGB-D 60 dataset, and the NTU RGB-D 120 dataset, where three evaluation types, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> X-subject, X-view, and A-view recognition, are performed. Experiments illustrate that our VTN model achieves outstanding performance.

0 Replies