Abstract: Vision-based sign language recognition is an extensively
researched problem aimed at advancing communication be-
tween deaf and hearing individuals. Numerous Sign Lan-
guage Recognition (SLR) datasets have been introduced to
promote research in this field, spanning multiple languages,
vocabulary sizes, and signers. However, most existing pop-
ular datasets focus predominantly on the frontal view of
signers, neglecting visual information from other perspec-
tives. In practice, many sign languages contain words
that have similar hand movements and expressions, making
it challenging to differentiate between them from a single
frontal view. Although a few studies have proposed sign lan-
guage datasets using multi-view data, these datasets remain
limited in vocabulary size and scale, hindering their gener-
alizability and practicality. To address this issue, we in-
troduce a new large-scale, multi-view sign language recog-
nition dataset spanning 1,000 glosses and 30 signers, re-
sulting in over 84,000 multi-view videos. To the best of
our knowledge, this is the first multi-view sign language
recognition dataset of this scale. In conjunction with of-
fering a comprehensive dataset, we perform extensive ex-
periments to assess the performance of state-of-the-art Sign
Language Recognition models utilizing on our dataset. The
findings indicate that utilizing multi-view data substantially
enhances model accuracy across all models, with a maxi-
mum performance improvement of up to 19.75% compared
to models trained on single-view data. Our dataset and
baseline models are publicly accessible on GitHub
Loading