Integrating Spoken and Signed Languages for Inclusive and Modality-Independent Large Language Models
Abstract: Sign language processing (SLP) is often reduced to translation using state-of-the-art computer vision models combined with neural machine translation systems. Comparatively, a growing field of instruct-tuned large language models can accomplish multiple NLP tasks end-to-end. However, signed languages are not included in these models; instead, special translation models are developed for signed languages. This paper proposes that SLP can be included in the (large) language model development, freeing sign language models from the necessity of low-resource multimodal learning from scratch. We introduce the first text-only and multimodal large (7B) LLaMA-based language models to be pre-trained and then fine-tuned on a sign language recognition task. We propose new prompting and fine-tuning strategies for text-only and multimodal SLP, incorporating both linguistics of signed languages and theoretically motivated strategies to mitigate catastrophic forgetting (of spoken language). We test the generalization of these models to other SLP tasks, showing LLMs are also capable sign language models that are still adept at spoken language tasks and, by changing the prompt, can even generalize to new prosodic and iconic sign translation tasks. Finally, we analyze trade-offs between our text-only and multimodal models. Our code and model checkpoints will be open-source. We will update our model suite as newer open-source LLMs, datasets, and SLP tasks become available.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: signed language, sign languages, LLM, multimodal LLM, VLM, pre-training, prompting, cross-modal pretraining, cross-modal machine translation, multimodality
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: German Sign Language (DGS), German, English
Submission Number: 770
Loading