GASDU: Gauss--Southwell Dynamic Update for Efficient LLM Fine-Tuning

Tao Wang; Luyang Fang; Wenxuan Zhong; Ping Ma

GASDU: Gauss--Southwell Dynamic Update for Efficient LLM Fine-Tuning

Tao Wang, Luyang Fang, Wenxuan Zhong, Ping Ma

04 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: parameter-efficient fine-tuning; LLM Post-training; large language models

TL;DR: A novel dynamic sparse fine-tuning method for efficient LLM adaption.

Abstract: Parameter-efficient fine-tuning (PEFT) is crucial for adapting large language models (LLMs), yet existing methods trade off accuracy, latency, and compute: some add inference-time modules, others fix a static parameter set that can drift from evolving gradients, and dynamic variants can be costly. We propose **Gauss–Southwell Dynamic Update (GASDU)**, which performs *periodic Gauss–Southwell-k selection: every M steps it uses the current gradients to select the (k) largest-magnitude coordinates and updates only those entries while reusing the mask until the next refresh. The Top-(k) selection is implemented in a streaming, tile-wise way to avoid materializing dense gradients, making the amortized refresh cost negligible. Theoretically, under a local Polyak–Łojasiewicz condition, we prove that GASDU enjoys a linear convergence rate scaled by a measurable gradient-retention factor and show that this factor degrades sublinearly within each refresh window. This sublinear decay implies that a moderate (M) can maintain a high retention factor, which in turn explains GASDU’s near–full–fine-tuning behavior. Empirically, GASDU sustains high retention between refreshes at an extreme parameter budget (0.01%) and consistently outperforms strong PEFT baselines and closely tracks or exceeds full fine-tuning across diverse commonsense and arithmetic reasoning benchmarks and LLMs (LLaMA-2/3 and GPT-OSS-20B).

Primary Area: foundation or frontier models, including LLMs

Submission Number: 1822

Loading