Inclusive Few-Shot Isolated Sign Language Recognition via Spatio-Temporal SlowFast Prototypes

Inclusive Few-Shot Isolated Sign Language Recognition via Spatio-Temporal SlowFast Prototypes

ACL ARR 2025 May Submission3727 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Sign language is a vital modality in human communication, yet current AI systems face significant challenges in recognizing it due to limited annotated data and high intra-class variability. In this work, we present a low-resource approach to isolated sign language recognition by framing it as a few-shot learning problem, using a prototypical network trained on a small support set. Our method utilizes a modified SlowFast convolutional architecture to extract rich spatio-temporal embeddings from sign videos, facilitating metric-based comparison between support-set exemplars and query clips. Unlike conventional models that require extensive training data, our approach generalizes to unseen sign classes using only a few labeled examples. We evaluate our model on the LSA64 dataset in a strict few-shot setting, achieving 88\% accuracy on held-out classes, substantially outperforming baselines. This study highlights the potential of combining efficient video representations with metric learning to enable scalable, data-efficient sign language understanding. Our results advocate for future human-AI interaction systems that are inclusive and accessible, even in low-resource communication domains.

Paper Type: Short

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: few-shot learning, representation learning, NLP for social good, video processing, human-AI interaction/cooperation, NLP in resource-constrained settings, data-efficient training

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: Argentinian Sign Language, LSA64 dataset

Submission Number: 3727

Loading