Spatial-temporal hypergraph based on dual-stage attention network for multi-view data lightweight action recognition

Zhixuan Wu, Nan Ma, Cheng Wang, Cheng Xu, Genbao Xu, Mingxing Li

Published: 01 Jan 2024, Last Modified: 13 May 2025Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Dual-stage attention network: It includes two stages: Temporal Attention Mechanism based on Trainable Threshold (TAM-TT) and Hypergraph Convolution based on Dynamic Spatial-Temporal Attention Mechanism (HG-DSTAM).•Salient region: HG-DSTAM divides the human joints into three parts: trunk, hand and leg to build spatial-temporal hypergraphs, extracts high-order features from spatial-temporal hypergraphs constructed of multi-view human body joints, inputs them into the dynamic spatial-temporal attention mechanism, and learns the intra frame correlation of multi-view data between the joint features of body parts, which can obtain the significant areas of action.•Spatial-temporal hypergraph neural network: It can learn the body parts that exhibit the highest frequency of action dynamics. This allows us to construct multiple spatial-temporal hypergraphs and update the weights of each human body node, thereby obtaining the salient regions of the actions.•Multi-view: We adopt a multi-view video data acquisition method, which provide multiple views information compared to static images/single-view video data, with potential consistency and complementarity, addressing the under-determined problem of data information for action recognition in complex scenes.•Action recognition: Action recognition is a challenging task in the field of computer vision. The swift and precise identification of human actions plays a pivotal role in enabling seamless interaction and cooperation between machines and humans, such as intelligent driving, medical control, and smart surveillance.