Stochastic Fine-Tuning of Language Models Using Masked Gradients

Mohammad Akbar-Tajari; Mohammad Taher Pilehvar

Stochastic Fine-Tuning of Language Models Using Masked Gradients

Mohammad Akbar-Tajari, Mohammad Taher Pilehvar

Published: 09 Jun 2025, Last Modified: 08 Jul 2025KDD 2025 Workshop SciSocLLMEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Scalable Fine-Tuning, Efficient Adaptation of Language Models, Context-Sensitive Parameter Updates

TL;DR: Stochastic Tuning selectively updates 0.08% of Language Models' parameters, reducing computation costs and overfitting, while improving task-specific performance through context-aware fine-tuning.

Abstract: Large Language Models (LLMs) have emerged as the dominant paradigm in Natural Language Processing owing to their remarkable performance across various target tasks. However, naively fine-tuning them for specific downstream tasks often requires updating a vast number of parameters, resulting in high computational costs and overfitting when training data is limited. In this paper, we propose a novel approach, called Stochastic Tuning, that addresses these challenges by selectively updating a small subset of parameters in each step of the tuning process. Our approach is characterized by its customization of updates based on task-specific partial gradients with respect to stochastic sub-networks. The advantage of Stochastic Tuning over existing solutions lies in its ability to consider both parameter weights as well as forward values which guarantees a context-sensitive fine-tuning. Our experiments demonstrate that Stochastic Tuning outperforms existing lightweight fine-tuning methods, improving average performance by over two points on RoBERTa across several tasks in the GLUE benchmark while updating merely **0.08**% of the model’s parameters. The code for our implementation can be found at https://github.com/m-Tajari/StocTuning_LLMs.

Submission Number: 2

Loading