Abstract: Recently, the Parameter Efficient Fine-Tuning (PEFT) method, which adjusts or introduces fewer trainable parameters to calibrate pre-trained models on downstream tasks, has been a hot research topic. However, existing PEFT methods within the traditional fine-tuning framework have two main shortcomings: 1) They overlook the explicit association between trainable parameters and downstream knowledge. 2) They neglect the interaction between the intrinsic task-agnostic knowledge of pre-trained models and the task-specific knowledge of downstream tasks. These oversights lead to insufficient utilization of knowledge and suboptimal performance. To address these issues, we propose a novel fine-tuning framework, named GIST, that can be seamlessly integrated into the current PEFT methods in a plug-and-play manner. Specifically, our framework first introduces a trainable token, called the Gist token, when applying PEFT methods on downstream tasks. This token serves as an aggregator of the task-specific knowledge learned by the PEFT methods and builds an explicit association with downstream tasks. Furthermore, to facilitate explicit interaction between task-agnostic and task-specific knowledge, we introduce the concept of knowledge interaction via a Bidirectional Kullback-Leibler Divergence objective. As a result, PEFT methods within our framework can enable the pre-trained model to understand downstream tasks more comprehensively by fully leveraging both types of knowledge. Extensive experiments on the 35 datasets demonstrate the universality and scalability of our framework. Notably, the PEFT method within our GIST framework achieves up to a 2.25% increase on the VTAB-1K benchmark with an addition of just 0.8K parameters (0.009‰ of ViT-B/16). Code is in the supplementary materials.
Primary Subject Area: [Content] Vision and Language
Relevance To Conference: In the era of large models, the fine-tuning process for downstream tasks is crucial for their ability to understand complex multimedia content. This paper investigates Parameter-Efficient Fine-Tuning (PEFT) methods in the fine-tuning process, introducing Gist as a plug-and-play module and incorporating a Bidirectional Kullback-Leibler Divergence (BKLD) objective. This approach facilitates an interaction between the inherent task-agnostic knowledge of pre-trained models and the task-specific knowledge of downstream tasks, enhancing the fine-tuning effects of existing PEFT methods without significantly increasing the parameter count.
Supplementary Material: zip
Submission Number: 588
Loading