Neuron-Level Language Tag Injection Improves Zero-Shot Translation Performance

Jay Orten; Ammon Shurtz; Nancy Fulda; Stephen D. Richardson

Neuron-Level Language Tag Injection Improves Zero-Shot Translation Performance

Jay Orten, Ammon Shurtz, Nancy Fulda, Stephen D. Richardson

Published: 22 Jun 2025, Last Modified: 17 Jul 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multilingual MT; few-shot/zero-shot MT; model architectures;

TL;DR: Concatenating embedded control tokens representing language directions onto hidden layers may provide improvements for zero-shot translation tasks.

Abstract: Language tagging, a method whereby source and target inputs are prefixed with a unique language token, has become the de facto standard for conditioning Multilingual Neural Machine Translation (MNMT) models on specific language directions. This conditioning can manifest effective zero-shot translation abilities in MT models at scale for many languages. Expanding on previous work, we propose a novel method of language tagging for MNMT, injection, in which the embedded representation of a language token is concatenated to the input of every linear layer. We explore a variety of different tagging methods, with and without injection, showing that injection improves zero-shot translation performance with up to a 2+ BLEU score point gain for certain language directions in our dataset.

Archival Status: Archival

Acl Copyright Transfer: pdf

Paper Length: Long Paper (up to 8 pages of content)

Submission Number: 51

Loading