Keywords: parameter-efficient fine-tuning; LLM Post-training; large language models
TL;DR: A novel dynamic sparse fine-tuning method for efficient LLM adaption.
Abstract: Parameter-efficient fine-tuning (PEFT) is crucial for adapting large language models (LLMs), yet existing methods trade off accuracy, latency, and compute: some add inference-time modules, others fix a static parameter set that can drift from evolving gradients, and dynamic variants can be costly. We propose **Gauss–Southwell Dynamic Update (GASDU)**, which performs *periodic Gauss–Southwell-k selection: every M steps it uses the current gradients to select the (k) largest-magnitude coordinates and updates only those entries while reusing the mask until the next refresh. The Top-(k) selection is implemented in a streaming, tile-wise way to avoid materializing dense gradients, making the amortized refresh cost negligible. Theoretically, under a local Polyak–Łojasiewicz condition, we prove that GASDU enjoys a linear convergence rate scaled by a measurable gradient-retention factor and show that this factor degrades sublinearly within each refresh window. This sublinear decay implies that a moderate (M) can maintain a high retention factor, which in turn explains GASDU’s near–full–fine-tuning behavior. Empirically, GASDU sustains high retention between refreshes at an extreme parameter budget (0.01%) and consistently outperforms strong PEFT baselines and closely tracks or exceeds full fine-tuning across diverse commonsense and arithmetic reasoning benchmarks and LLMs (LLaMA-2/3 and GPT-OSS-20B).
Primary Area: foundation or frontier models, including LLMs
Submission Number: 1822
Loading