PC-LoRA: Progressive Model Compression with Low Rank Adaptation

Published: 05 Mar 2024, Last Modified: 12 May 2024PML4LRS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Model Compression, Low Rank Adpatation, Parameter Efficient Finetuing
TL;DR: This work presents Progressive Compression LoRA~(PC-LoRA), a novel extension of Low-Rank Adaptation (LoRA), designed to enable model compression and parameter-efficient fine-tuning concurrently for pre-trained models.
Abstract: This work presents Progressive Compression LoRA (PC-LoRA), a novel extension of Low-Rank Adaptation (LoRA), designed to enable model compression and parameter-efficient fine-tuning. To mitigate the computational costs of large-scale models, PC-LoRA introduces a approach of decaying model weights to zero. This method allows to model compression and efficient fine-tuning by progressively reducing the pre-trained weights during the fine-tuning process until they are completely removed. Through empirical analysis on various models, we demonstrate that PC-LoRA significantly reduces computational costs with minor performance degradation. Compared to full fine-tuning and LoRA fine-tuning, PC-LoRA shows an average performance drop of -3.085%. Despite this, our method substantially compresses models, by 94.1% / 89.1% in parameters and FLOPs for vision models, and achieves a 93.5% parameter and 84.2% Flops reduction for NLP models.
Submission Number: 39
Loading