Abstract: In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different elements of the prompt, and their cross-lingual representation space. We will make our models publicly available\footnote{We release anonymous code at \url{https://anonymous.4open.science/r/Plume_fork-69D1}}.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: large language models, multilingual neural machine translation, zero-shot translation, parallel data, interpretability
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: Catalan,Spanish,French,Italian,Portuguese,Galician,German,English,Basque
Submission Number: 684
Loading