Abstract: This paper presents a lightweight and efficient baseline for isolated Indian Sign Language (ISL) word recognition developed for the WSLP-AACL-2025 Shared Task.We propose a two-stage framework combining skeletal landmark extraction via MediaPipe Holistic with a Temporal Convolutional Network (TCN) for temporal sequence classification. The system processes pose-based input sequences instead of raw video, significantly reducing computation and memory costs. Trained on the WSLP-AACL-2025 dataset containing 4,398 isolated sign videos across 4,361 word classes, our model achieves 54% top-1 and 78% top-5 accuracy.
Loading