Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Massively Multilingual Machine Translation, Non-blocking, Federated Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: What is the maximal number of languages that a single machine translation model can translate? It is a critical challenge to learn a single model for massive languages. Prior methods focus on increasing the model size and training data size. However, large models are difficult to optimize efficiently even with distributed parallel training and translation capacity can interfere among languages. To address the challenge, we propose LegoMT2, an efficient approach with a tailored model architecture for massive multilingual neural machine translation. LegoMT2 organizes 435 languages into 8 language-centric groups and attributes one local encoder-decoder for each group and a global encoder-decoder for all languages. LegoMT2 then trains each local and global encoder-decoder on a group-dedicated set of clients through asynchronous updating of parameters. We trained LegoMT2 on a large dataset with 25 billion sentence pairs beyond English-centric. LegoMT2 is 16.2$\times$ faster than the distributed training method for the same-size NLLB while improving the translation results by an average of 2.2 BLEU on \textit{Flores-101}~\footnote{We will release the model and code to the public.}.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4754
Loading