Efficient spatiotemporal context modeling for action recognition

Congqi Cao, Yue Lu, Yifan Zhang, Dongmei Jiang, Yanning Zhang

Published: 2023, Last Modified: 25 Aug 2024Neurocomputing 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We extend the 2D criss-cross attention to 3D, which gives its ability to model sparse context in spatiotemporal space. Compared to non-local attention, the complexity of CCA-3D for spatiotemporal context modeling is greatly reduced, and hence the computational and memory burden is much lower.•We propose to stack CCA-3Ds and devise a novel recurrent structure that can leverage the appearance for dense spatiotemporal context modeling. The proposed RCCA-3D structure addresses the inability of the original RCCA-2D structure to model the entire spatiotemporal context. It is more suitable for action recognition than the directly extended 3D version of RCCA-2D.•We conduct extensive experiments with 3 backbones on 5 RGB-based and skeleton-based datasets to comprehensively verify the effectiveness of our method. All of the backbones equipped with RCCA-3D achieve better and leading performance on those datasets.