Keywords: sign language, low-resource translation
Abstract: In this work, we present T2G-Reasoner, a framework equipped with a reasoning mechanism to improve text-to-gloss translation (T2G), where gloss is a written record of sign language.
The reasoning LLMs have achieved remarkable success in a range of NLP tasks, benefiting from their strong generalization capability stemming from pretraining on massive data.
However, incentivizing the reasoning capabilities for the T2G task is challenging due to the absence of gloss information in LLMs' pretraining.
Considering shared lexical concepts between two languages, we leverage an advanced LLM to extract word-level alignments as the T2G reasoning process.
Instead of directly generating sign language gloss, the proposed method structures the model's output into two distinct components, \emph{i.e.}, the word-level alignments and the final gloss translation.
T2G-Reasoner adopts a two-stage training strategy, \emph{i.e.}, SFT-based imitation and RL-based exploration.
The T2G-Reasoner model is first fine-tuned on the synthetic reasoning data, which establishes a foundational layer of reasoning capability.
As the synthetic reasoning data may be of lower quality, the T2G-Reasoner model further leverages the RL algorithm to autonomously discover optimal word-level alignments.
Extensive experiments on two benchmark datasets show that the proposed T2G-Reasoner achieves significant performance improvements.
Additionally, our T2G-Reasoner exhibits great potential to address out-of-vocabulary (OOV) challenges in T2G.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 23323
Loading