When does Parameter-Efficient Transfer Learning Work for Machine Translation?Download PDF

Anonymous

08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone
Paper Link: https://openreview.net/forum?id=hKBRTZBaggi
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: We study parameter-efficient transfer learning methods that adapt a pre-trained model by fine-tuning a small number of parameters, for machine translation. We conduct experiments across a diverse set of languages, comparing different fine-tuning methods in terms of (1) parameter budget, (2) language-pair, and (3) different pre-trained models. We show that methods such as adapters and prefix-tuning that add parameters to a pre-trained model perform best. However, methods which fine-tune a subset of existing parameters, e.g. BitFit and cross-attention tuning, are better correlated with pre-trained model capability. Furthermore, we found a large performance variation across language pairs, with parameter-efficient methods particularly struggling for distantly related language-pairs. Finally, we show that increasing model size, but tuning only 0.03% of total parameters, can outperform tuning 100% of the parameters of a smaller model
0 Replies

Loading