An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning

Cen-Jhih Li; Aditya Bhaskara

An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning

Cen-Jhih Li, Aditya Bhaskara

Published: 02 Dec 2025, Last Modified: 02 Dec 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Fine-tuning is an important step in adapting foundation models such as large language models to downstream tasks. To make this step more accessible to users with limited computational budgets, it is crucial to develop fine-tuning methods that are memory and computationally efficient. Sparse Fine-tuning (SpFT) and Low-rank adaptation (LoRA) are two frameworks that have emerged for addressing this problem and have been adopted widely in practice. In this work, we develop a new SpFT framework, based on ideas from neural network pruning. At a high level, we first identify ``important'' neurons/nodes using feature importance metrics from network pruning (specifically, we use the structural pruning method), and then perform fine-tuning by restricting to weights involving these neurons. Experiments on common language tasks show our method improves SpFT’s memory efficiency by 20–50% while matching the accuracy of state-of-the-art methods like LoRA’s variants.

Submission Length: Long submission (more than 12 pages of main content)

Code: https://github.com/CenjhihLi/sparsity_finetuning

Assigned Action Editor: ~Guillaume_Rabusseau1

Submission Number: 5371

Loading