SCaTNet: A Novel Self-supervised Contrastive Framework with Spatial-Channel Attention and Temporal Transformer for Few-Shot Action Recognition

Zanxi Ruan, Yingmei Wei, Yanming Guo, Yuxiang Xie, Yifei Yuan

Published: 2023, Last Modified: 16 May 2025ACAI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We introduce SCaTNet, an innovative method for few-shot action recognition tasks that leverages the synergies of contrastive learning and advanced attention mechanisms. Distinct from previous few-shot methods, SCaTNet comprehensively explores the potential value of the sample data at both high and low dimensional levels. SCaTNet integrates the Quadruplet Attention mechanism (QA) with a Multimodal Temporal Contrastive Learning (MTCL) strategy, significantly enhancing video recognition and interpretation of action features. Our vast experiments on the SSv2-Small dataset show that SCaTNet’s superior performance is competitive with existing classical state-of-the-art methods, highlighting its effectiveness and practical utility in few-shot action recognition.