Bootstrapping Pre-trained Word Embedding Models for Sign Language Gloss Translation

Euan McGill, Luis Chiruzzo, Horacio Saggion

Published: 2024, Last Modified: 19 May 2025EAMT (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper explores a novel method to modify existing pre-trained word embedding models of spoken languages for Sign Language glosses. These newly-generated embeddings are described, visualised, and then used in the encoder and/or decoder of models for the Text2Gloss and Gloss2Text task of machine translation. In two translation settings (one including data augmentation-based pre-training and a baseline), we find that bootstrapped word embeddings for glosses improve translation across four Signed/spoken language pairs. Many improvements are statistically significant, including those where the bootstrapped gloss embedding models are used.Languages included: American Sign Language, Finnish Sign Language, Spanish Sign Language, Sign Language of The Netherlands.