HART: Efficient Adaptation via Regularized Autoregressive Parameter Generation

Chen Liang; Nikos Karampatziakis; Tuo Zhao; Weizhu Chen

HART: Efficient Adaptation via Regularized Autoregressive Parameter Generation

Chen Liang, Nikos Karampatziakis, Tuo Zhao, Weizhu Chen

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: hypernetwork, weight generation, parameter-efficient fine-tuning, task adaptation, in-context learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose a novel hypernetwork approach to generate PEFT module weights for efficient task adaptation.

Abstract: Fine-tuning is an effective approach for adapting a pre-trained language model to downstream tasks, but it incurs a high computational cost. To achieve an extremely efficient task adaptation, \citet{phang2022hypertuning} have proposed to use an auxiliary hypernetwork to generate task-specific weights without any backpropagation. A hypernetwork can generate weights for parameter-efficient fine-tuning (PEFT) modules, such as prefixes \citep{li2021prefix} and LoRAs \citep{hu2021lora}, for any unseen task based on a few task-specific demonstration examples, at the cost of a single forward pass. However, hypernetwork training is challenging. Firstly, it is sample inefficient due to the under-exploitation of the dependencies between PEFT weights across layers. Secondly, it exhibits training instability due to the high diversity of few-shot demonstration inputs. To address these limitations, we propose a novel hypernetwork training approach, named HART. It exploits layerwise dependencies by autoregressively generating weights for individual layers, and stabilizes the training by regularizing the consistency between weights generated based on different demonstrations. We train the hypernetwork on a diverse collection of tasks \citep{wang2022super,sanh2021multitask} and evaluate its performance on unseen tasks. HART notably outperforms \citet{phang2022hypertuning} on both T5-Large and T5-XL models.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7565

Loading