Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary DatasetsDownload PDF

06 Jun 2022, 17:16 (modified: 12 Oct 2022, 18:40)NeurIPS 2022 Datasets and Benchmarks Readers: Everyone
Keywords: sign language
TL;DR: We release the largest available pretraining dataset for sign language across multiple languages and show how multilingual fine-tuning using a unified vocabulary is helpful to achieve SOTA results
Abstract: There are over 300 sign languages in the world, many of which have very limited or no labelled sign-to-text datasets. To address low-resource data scenarios, self-supervised pretraining and multilingual finetuning have been shown to be effective in natural language and speech processing. In this work, we apply these ideas to sign language recognition. We make three contributions. - First, we release SignCorpus, a large pretraining dataset on sign languages comprising about 4.6K hours of signing data across 10 sign languages. SignCorpus is curated from sign language videos on the internet, filtered for data quality, and converted into sequences of pose keypoints thereby removing all personal identifiable information (PII). - Second, we release Sign2Vec, a graph-based model with 5.2M parameters that is pretrained on SignCorpus. We envisage Sign2Vec as a multilingual large-scale pretrained model which can be fine-tuned for various sign recognition tasks across languages. - Third, we create MultiSign-ISLR -- a multilingual and label-aligned dataset of sequences of pose keypoints from 11 labelled datasets across 7 sign languages, and MultiSign-FS -- a new finger-spelling training and test set across 7 languages. On these datasets, we fine-tune Sign2Vec to create multilingual isolated sign recognition models. With experiments on multiple benchmarks, we show that pretraining and multilingual transfer are effective giving significant gains over state-of-the-art results. All datasets, models, and code has been made open-source via the OpenHands toolkit.
Supplementary Material: zip
Dataset Url: Unlabeled pretraining datasets for pose-based self-supervised learning: Label-aligned ISLR and finger-spelling pose-based datasets:
License: MIT License
Author Statement: Yes
Contribution Process Agreement: Yes
In Person Attendance: Yes
28 Replies