Keywords: PEFT, Quantization, Compression, Finetuning, Foundation Model, LLM training
TL;DR: We propose a PEFT approach with low quantization error.
Abstract: Fine-tuning is essential for adapting large language models to downstream tasks, but can be costly for users with limited resources. To address this, Sparse Fine-tuning (SpFT) and Low-rank adaptation (LoRA) have been widely adopted for efficient fine-tuning. In this work, we propose a new SpFT framework inspired by neural network pruning: we identify important neurons using structural pruning and fine-tune only the associated weights. Experiments on common language tasks show our method improves SpFT’s memory efficiency by 20–50\% while matching the accuracy of state-of-the-art methods like LoRA's variants.
Submission Number: 27
Loading