Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets
Keywords: sign language
TL;DR: We release the largest available pretraining dataset for sign language across multiple languages and show how multilingual fine-tuning using a unified vocabulary is helpful to achieve SOTA results
Abstract: There are over 300 sign languages in the world, many of which have very limited or no labelled sign-to-text datasets. To address low-resource data scenarios, self-supervised pretraining and multilingual finetuning have been shown to be effective in natural language and speech processing. In this work, we apply these ideas to sign language recognition.
We make three contributions.
- First, we release SignCorpus, a large pretraining dataset on sign languages comprising about 4.6K hours of signing data across 10 sign languages. SignCorpus is curated from sign language videos on the internet, filtered for data quality, and converted into sequences of pose keypoints thereby removing all personal identifiable information (PII).
- Second, we release Sign2Vec, a graph-based model with 5.2M parameters that is pretrained on SignCorpus. We envisage Sign2Vec as a multilingual large-scale pretrained model which can be fine-tuned for various sign recognition tasks across languages.
- Third, we create MultiSign-ISLR -- a multilingual and label-aligned dataset of sequences of pose keypoints from 11 labelled datasets across 7 sign languages, and MultiSign-FS -- a new finger-spelling training and test set across 7 languages. On these datasets, we fine-tune Sign2Vec to create multilingual isolated sign recognition models. With experiments on multiple benchmarks, we show that pretraining and multilingual transfer are effective giving significant gains over state-of-the-art results.
All datasets, models, and code has been made open-source via the OpenHands toolkit.
Author Statement: Yes
Dataset Url: Unlabeled pretraining datasets for pose-based self-supervised learning:
https://openhands.ai4bharat.org/en/latest/instructions/self_supervised.html
Label-aligned ISLR and finger-spelling pose-based datasets:
https://openhands.ai4bharat.org/en/latest/instructions/datasets.html
License: MIT License
Supplementary Material: zip
Contribution Process Agreement: Yes
In Person Attendance: Yes
28 Replies
Loading