Low-Resource Machine Translation through the Lens of Personalized Federated Learning

ACL ARR 2024 June Submission2798 Authors

15 Jun 2024 (modified: 08 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present a new approach based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the dataset from the Large-Scale Multilingual Machine Translation Shared Task (Small Track \#2) and the subset of Sami languages from the multilingual benchmark for Finno-Ugric languages. In addition to its effectiveness, MeritFed is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: Low-Resource Languages, Machine Translation, Interpretability
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Theory
Languages Studied: English,Finnish,Javanese,Malay,Indonesian,Tagalog,Tamil,North Sami,South Sami,Inari Sami,Skolt Sami
Submission Number: 2798
Loading