Improving Continuous Sign Language Recognition via Cross-Frame Interactions in Expanded Contextual Spaces

Published: 2025, Last Modified: 04 Nov 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Current continuous sign language recognition (CSLR) methods typically rely on single or adjacent frames for calculations, which can overlook broader contextual information and result in lower accuracy. To address this issue, we introduce CVSign, which constructs an extended contextual space frame by frame while enabling comprehensive cross-frame interaction. Specifically, we present two innovative modules: Contextual Correspondence Awareness (CCA) and Contextual Variability Awareness (CVA). CCA enhances the relevance of contextual features by utilizing cross-frame multi-head query attention to identify and prioritize related areas while suppressing irrelevant regions. CVA captures motion changes at varying speeds by employing difference calculations between multiple frames, effectively minimizing static redundancy. Remarkably, experimental results show that CVSign outperforms the previous state-of-the-art method by a clear margin on widely used datasets, including PHOENIX14, PHOENIX14-T, and CSL-Daily.
Loading