Compacter: Efficient Low-Rank Hypercomplex Adapter LayersDownload PDF

Published: 09 Nov 2021, Last Modified: 22 Oct 2023NeurIPS 2021 PosterReaders: Everyone
Keywords: parameter-compact fine-tuning methods, adapters, low-rank methods, hypercomplex multiplication layers, adapting large-scale language models
Abstract: Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning. In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers. Specifically, Compacter inserts task-specific weight matrices into a pretrained model's weights, which are computed efficiently as a sum of Kronecker products between shared ``slow'' weights and ``fast'' rank-one matrices defined per Compacter layer. By only training 0.047% of a pretrained model's parameters, Compacter performs on par with standard fine-tuning on GLUE and outperforms standard fine-tuning on SuperGLUE and low-resource settings. Our code is publicly available at https://github.com/rabeehk/compacter.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
TL;DR: We have proposed Compacter, a light-weight fine-tuning method for large-scale language models and demonstrated that despite learning 2127.66x fewer parameters than standard fine-tuning, it obtains comparable performance.
Supplementary Material: pdf
Code: https://github.com/rabeehk/compacter
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2106.04647/code)
13 Replies

Loading