When does Parameter-Efficient Transfer Learning Work for Machine Translation?

Anonymous

When does Parameter-Efficient Transfer Learning Work for Machine Translation?

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone

Abstract: We study parameter-efficient transfer learning methods that adapt a pre-trained model by fine-tuning a small number of parameters, for machine translation. We conduct experiments across a diverse set of languages, comparing different fine-tuning methods in terms of (1) parameter budget, (2) language-pair, and (3) different pre-trained models. We show that methods such as adapters and prefix-tuning that add parameters to a pre-trained model perform best. However, methods which fine-tune a subset of existing parameters, e.g. BitFit and cross-attention tuning, are better correlated with pre-trained model capability. Furthermore, we found a large performance variation across language pairs, with parameter-efficient methods particularly struggling for distantly related language-pairs. Finally, we show that increasing model size, but tuning only 0.03% of total parameters, can outperform tuning 100% of the parameters of a smaller model

Paper Type: long

0 Replies

Loading