Assessing Large Pre-trained Models for Sign Language Processing: Is Text-Only Superior to Multimodal?Download PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Motivated by the recent success of text-only modeling in certain vision-language tasks, this paper proposes that sign language processing can also use (large) text-only language models for inference, freeing sign language models from the necessity of low-resource multimodal learning from scratch. To compare the performance of pre-trained text-only models against multimodal ones, we introduce the first text-only and multimodal large (7B) language models to be pre-trained and then fine-tuned on a sign language recognition task. We propose new prompting strategies and fine-tuning strategies for text-only signed language processing, incorporating both linguistics of signed languages and theoretically motivated strategies to mitigate catastrophic forgetting (of spoken language). We test the generalization of these models to other sign language recognition and generation tasks, showing text-only models are capable sign language models that are still adept at spoken language tasks and, by changing the prompt, can even generalize to new prosodic and iconic sign recognition tasks. Finally, we analyze trade-offs between our text-only and multimodal models. Our code and model checkpoints will be open-source.
Paper Type: long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: German Sign Language, German, English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview