Keywords: scaling laws, machine translation, multilinguality, multi-task optimization
TL;DR: We study the scaling behavior of multilingual, multi-task neural machine translation models.
Abstract: In this work, we provide a large-scale empirical study of the scaling properties of multilingual (multitask) neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the individual task weights on the scaling behavior. We find that these weights only affect the multiplicative factor of the scaling law and in particular, the scaling exponent is unaffected by them. Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each task and examine the role of language similarity in the scaling behavior of our models. We find minimal evidence that language similarity has any impact. In contrast, ``direction'' of the multilinguality plays a big role, with models translating from multiple languages into English having a larger number of effective parameters per task than their reversed counterparts. Finally, we leverage our observations to predict the performance of multilingual models trained with any language weighting at any scale, greatly reducing efforts required for task balancing in large multitask models. Our findings apply to both in-domain and out-of-domain test sets and to multiple evaluation metrics, such as ChrF and BLEURT.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning