Abstract: The growing scale of pre-trained language models poses a challenge in fine-tuning for downstream tasks, especially in resource-constrained settings. Recent studies highlight that not all layers in Transformer-based language models contribute equally to downstream task performance, giving rise to various partial fine-tuning strategies. We propose a training-free approach for layer-wise partial fine-tuning that leverages the cosine similarity between representative tokens across layers to identify inter-layer relationships. Our method comprises two stages: (i) scoring layers based on their relevance to the task via a single forward pass, and (ii) fine-tuning a subset of layers, either highest-scoring, lowest-scoring, or block-wise, while keeping others frozen. We conduct experiments on 16 diverse NLP datasets, including single-sentence and sentence-pair classification tasks, as well as generation tasks. Our method achieves competitive performance compared to full fine-tuning, with an average training speedup of 1.5
and a reduction of trainable parameters by 75%, and outperforms all comparative baselines in 14 out of 16 evaluated datasets. Additionally, our approach does not cause any notable drop in performance when the domain is changed for the evaluation tasks, demonstrating a robust cross-domain performance.
Loading