Pose-Based Temporal Convolutional Networks for Isolated Indian Sign Language Word Recognition

TATIGUNTA BHAVI TEJA REDDY, Vidhya Kamakshi

Published: 22 Dec 2025, Last Modified: 28 Jan 2026Proceedings of the Workshop on Sign Language Processing (WSLP), pages 51–54 December 20-24, 2025 ©2025 Association for Computational LinguisticsEveryoneRevisionsCC BY 4.0

Abstract: This paper presents a lightweight and efficient baseline for isolated Indian Sign Language (ISL) word recognition developed for the WSLP-AACL-2025 Shared Task.We propose a two-stage framework combining skeletal landmark extraction via MediaPipe Holistic with a Temporal Convolutional Network (TCN) for temporal sequence classification. The system processes pose-based input sequences instead of raw video, significantly reducing computation and memory costs. Trained on the WSLP-AACL-2025 dataset containing 4,398 isolated sign videos across 4,361 word classes, our model achieves 54% top-1 and 78% top-5 accuracy.