Abstract: Despite the presence of many competitive PEFT methods like LoRA, we still need a PEFT method that is efficient under the single-backbone multi-tenant setting while performing competitively in the downstream tasks. In this work, we propose a novel PEFT method, \underline{P}rompt \underline{A}ware \underline{R}epresentation \underline{AD}justm\underline{E}nt (PARADE). First, we propose to install a lightweight vector generator at each Transformer layer to generate vectors that will modify the hidden states in the multi-head self-attention (MHSA) and position-wise feed-forward (FFN) modules and, as a result, modulate the behaviors of the pre-trained backbone. Second, the vector generators are modules with a bottleneck architecture consisting of a pooling operation, two linear projections, and an activation function. To enhance the downstream performance of vector generators, we propose an attention-based capsule network as the pooling operation, which can effectively summarize the semantic information in the input instructions. We have conducted experiments on various tasks, and the experimental results demonstrate that: (a) our PARADE method can outperform the recent baselines with comparable tunable parameters. (b) Our PARADE method is more efficient than LoRA under the single-backbone multi-tenant setting.\footnote{Codes and fine-tuned models will be open-sourced to facilitate future research. }
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
0 Replies
Loading