Unsupervised Sign Language Translation and Generation

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: unsupervised, sign language translation, natural language processing
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Sign language translation and generation are crucial in facilitating communication between the deaf and hearing communities.However, the scarcity of parallel sign language video-to-text data poses a considerable challenge to developing effective sign language translation and generation systems.Motivated by the success of unsupervised neural machine translation (UNMT), this paper introduces an unsupervised sign language translation and generation network (USLNet), which learns from abundant single-modality (text and video) data without parallel sign language data. Inspired by UNMT, USLNet comprises two main components: single-modality reconstructing modules (text and video) that rebuild the input from its noisy version in the same modality and cross-modality back-translation modules (text-video-text and video-text-video) that reconstruct the input from its noisy version in the different modality using back-translation procedure.Unlike the single-modality back-translation procedure in text-based UNMT, USLNet faces the cross-modality discrepancy in feature representation, in which the length and the feature dimension mismatch between text and video sequences.To address the issues, we propose a sliding window method to align variable-length text with video sequences.To our knowledge, USLNet is the first unsupervised sign language translation and generation model capable of generating both text and sign language video in a unified manner.Experimental results on the BBC-Oxford Sign Language datasets (BOBSL) reveal that USLNet achieves competitive results compared to supervised baseline models, indicating its effectiveness in sign language translation and generation.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8533
Loading