Neuron-Level Language Tag Injection Improves Zero-Shot Translation Performance

Published: 22 Jun 2025, Last Modified: 27 Jun 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multilingual MT; few-shot/zero-shot MT; model architectures;
TL;DR: Concatenating embedded control tokens representing language directions onto hidden layers may provide improvements for zero-shot translation tasks.
Abstract: Language tagging, a method whereby source and target inputs are prefixed with a unique language token, has become the de facto standard for conditioning Multilingual Neural Machine Translation (MNMT) models on specific language directions. This conditioning can manifest effective zero-shot translation abilities in MT models at scale for many languages. Expanding on previous work, we propose a novel method of language tagging for MNMT, injection, in which the embedded representation of a language token is concatenated to the input of every linear layer. We explore a variety of different tagging methods, with and without injection, showing that injection improves zero-shot translation performance with up to a 2+ BLEU score point gain for certain language directions in our dataset.
Student Status: zip
Archival Status: Archival
Acl Copyright Transfer: pdf
Paper Length: Long Paper (up to 8 pages of content)
Submission Number: 51
Loading