SNet: Skeleton-Aware SlowFast Network for Efficient Sign Language Recognition

Published: 2022, Last Modified: 09 Nov 2025ACCV (4) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Continuous Sign Language Recognition (CSLR) aims to interpret meaning from signers’ postures and movements. Joint-wise correspondences between estimated skeleton data and sign videos provide complementary insights into appearance and motion. In this paper, we propose a Skeleton-aware SlowFast Network(S\(^2\)Net) to effectively capture the appearance and motion information in sign videos. S\(^2\)Net leverages skeleton data in the fast pathway and video data in the slow pathway, progressively integrating both streams of information. Initially, we project both skeleton and video data into a unified graph-structured space and employ a consistent GCN-based architecture for both pathways, then we propose a group-wise cross-attention module to fuse intermediate features between different pathways. Finally, a frame-wise fusion pathway is adopted to integrate the semantic information at the sequence level. Experimental results on three public datasets demonstrate the effectiveness and efficiency of the proposed method.
Loading