Generative Parameter Efficient Fine-Tuning

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Parameter Efficient Fine-Tuning, Transfer Learning
TL;DR: Generative Parameter Efficient Fine-Tuning (GIFT) presents a method to learn explicit, linear mapping between pretrained and fine-tuned models, and outperforms prior methods with ~15 times fewer parameters
Abstract: Fine-tuning pretrained (large) Transformer backbones efficiently for downstream tasks has been extensively explored using both Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA) and its variants, as well as more recent Representation-Efficient Fine-Tuning (ReFT) approaches. In both of these formulations, fine-tuning weights for selected pretrained layers are treated as model parameters that are directly learned from the downstream task data, often making them layer-specific. While these methods simultaneously aim for memory efficiency, some approaches, such as VeRA (Vector-based Random matrix Adaptation), may not achieve this consistently in practice. In this paper, we propose a novel approach for generating fine-tuning weights through a configurable layer-sharing mechanism, termed Generative parameter-efficient Fine-Tuning (GIFT). GIFT uses a simple parameterization scheme involving two linear layers (without bias terms) to enable efficient fine-tuning. This method bridges the gap between PEFT and ReFT, ensuring both parameter and memory efficiency. GIFT can be viewed as a variant of LoRA with parameters shared across layers, conditioned on the pretrained weights, with significantly fewer trainable parameters. Through extensive experiments, we demonstrate that our GIFT consistently achieves superior performance and parameter efficiency compared to baselines on commonsense and arithmetic reasoning tasks, instruction tuning with the Llama family of models, and visual recognition benchmarks with Vision Transformers. Notably, GIFT achieves a 5.7% absolute increase in average accuracy with a 14x reduction in trainable parameters compared to LoRA on the Commonsense170k dataset using Llama-3 (8B), and a 5.4% increase in win rate with a 4x reduction in parameters using Llama-2 (7B) during instruction tuning. Our method also attains a slightly higher win rate for instruction tuning than GPT-3.5 (Turbo 1106)
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8366
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview