Abstract: Fine-tuning is the standard approach when adapting pre-trained large language models for specific downstream tasks. However, the energy and time required to fully fine-tune all parameters can become prohibitively large for many applications as the size of the model increases. While recent advancements in parameter-efficient transfer learning have reduced the number of parameters that need to be updated, the training time and energy consumption of these methods remain similar to full fine-tuning. In this paper, we propose a time-efficient fine-tuning method based on feature-extraction in which we treat off-the-shelf language models as fixed sources of embeddings and train small feed-forward networks on top for each downstream task. Averaging across the GLUE NLI benchmark, our method trains $124$ times faster than full fine-tuning and $101$ times faster than parameter-efficient fine-tuning methods using distilRoBERTa, while achieving 81.9% and 85.0% performance respectively.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: English
0 Replies
Loading