Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

ACL ARR 2024 June Submission2437 Authors

15 Jun 2024 (modified: 05 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Decoder-only architecture performs poorly in multilingual neural machine translation, despite its potential benefits in zero-shot translation, i.e., translation of unseen language pairs during training. In this work, we identify the main issue of the decoder-only architecture as its lack of language transfer capability. Specifically, representations from different source languages are not aligned in the representational subspace of the target language. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage to implicitly boost the transfer capability across languages. Additionally, we impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation. We conduct experiments on TED-19 and OPUS-100 datasets, considering both training from scratch and fine-tuning scenarios. Experimental results show that, compared to the encoder-decoder architecture, our methods not only perform competitively in supervised translations but also achieve improvements of up to 3.39 BLEU, 6.99 chrF++, 3.22 BERTScore, and 4.81 COMET in zero-shot translations.

Paper Type: Long

Research Area: Machine Translation

Research Area Keywords: Machine Translation, Multilingualism and Cross-Lingual NLP

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2437

Loading