Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Dami Choi; Derrick Xin; Hamid Dadkhahi; Justin Gilmer; Ankush Garg; Orhan Firat; Chih-Kuan Yeh; Andrew M. Dai; Behrooz Ghorbani

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Dami Choi, Derrick Xin, Hamid Dadkhahi, Justin Gilmer, Ankush Garg, Orhan Firat, Chih-Kuan Yeh, Andrew M. Dai, Behrooz Ghorbani

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: Multitask Optimization, Multilingual, Pre-training, Language Models, Language Sampling, Low Resource Languages, Overfitting

TL;DR: We present a simple method for multitask optimization in the presence of data imbalance, and verify its efficacy via thorough empirical evaluations.

Abstract: In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling.

Submission Number: 5308

Loading