Investigating the translation capabilities of Large Language Models trained on parallel data only

Investigating the translation capabilities of Large Language Models trained on parallel data only

ACL ARR 2024 June Submission684 Authors

12 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different elements of the prompt, and their cross-lingual representation space. We will make our models publicly available\footnote{We release anonymous code at \url{https://anonymous.4open.science/r/Plume_fork-69D1}}.

Paper Type: Long

Research Area: Machine Translation

Research Area Keywords: large language models, multilingual neural machine translation, zero-shot translation, parallel data, interpretability

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models

Languages Studied: Catalan,Spanish,French,Italian,Portuguese,Galician,German,English,Basque

Submission Number: 684

Loading