Sign Language Recognition: A Large-Scale Multi-View Dataset and Comprehensive Evaluation

Son Dinh Nguyen, Tuan Dung Nguyen, Tri Tran, Nguyen Dang Huy Pham, Thuan Hieu Tran, Ngoc Anh Tong, Hoang Quang Huy, Phi Le Nguyen

Published: 28 Feb 2025, Last Modified: 07 Mar 2025Proceedings of the Winter Conference on Applications of Computer Vision (WACV)EveryoneCC BY 4.0

Abstract: Vision-based sign language recognition is an extensively researched problem aimed at advancing communication be- tween deaf and hearing individuals. Numerous Sign Lan- guage Recognition (SLR) datasets have been introduced to promote research in this field, spanning multiple languages, vocabulary sizes, and signers. However, most existing pop- ular datasets focus predominantly on the frontal view of signers, neglecting visual information from other perspec- tives. In practice, many sign languages contain words that have similar hand movements and expressions, making it challenging to differentiate between them from a single frontal view. Although a few studies have proposed sign lan- guage datasets using multi-view data, these datasets remain limited in vocabulary size and scale, hindering their gener- alizability and practicality. To address this issue, we in- troduce a new large-scale, multi-view sign language recog- nition dataset spanning 1,000 glosses and 30 signers, re- sulting in over 84,000 multi-view videos. To the best of our knowledge, this is the first multi-view sign language recognition dataset of this scale. In conjunction with of- fering a comprehensive dataset, we perform extensive ex- periments to assess the performance of state-of-the-art Sign Language Recognition models utilizing on our dataset. The findings indicate that utilizing multi-view data substantially enhances model accuracy across all models, with a maxi- mum performance improvement of up to 19.75% compared to models trained on single-view data. Our dataset and baseline models are publicly accessible on GitHub