Keywords: Sign Language Recognition; Sign Language Translation;
TL;DR: This paper integrates SLT subtasks into a single framework named MonoSLT based on the monotonically aligned nature of SLT subtasks, and adopts two kinds of constraints to further improve the faithfulness of SLT models.
Abstract: Sign language translation (SLT) aims to translate perceived visual signals into spoken language. Recent works have achieved impressive performance by improving visual representations and adopting advanced machine translation techniques, but the faithfulness (\ie, whether the SLT model captures correct visual signals) in SLT has not received enough attention. In this paper, we explore the association among SLT-relevant tasks and find that the imprecise glosses and limited corpora may hinder faithfulness in SLT. To improve faithfulness in SLT, we first integrate SLT subtasks into a single framework named MonoSLT, which can share the acquired knowledge among SLT subtasks based on their monotonically aligned nature. We further propose two kinds of constraints: the alignment constraint aligns the visual and linguistic embeddings through a sharing translation module and synthetic code-switching corpora; the consistency constraint integrates the advantages of subtasks by regularizing the prediction consistency. Experimental results show that the proposed MonoSLT is competitive against previous SLT methods by increasing the utilization of visual signals, especially when glosses are imprecise.
Supplementary Material: pdf
Submission Number: 11748
Loading