Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Anonymous

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone

Abstract: Finetuning large language models (LLMs) has been empirically demonstrated to be effective on a variety of downstream tasks. To efficiently finetune LLMs, most prior arts either focus on parameter-efficient finetuning, which only updates a small number of parameters, or attempt to reduce the memory footprint during the training phase of the finetuning. Typically, the memory footprint during finetuning mainly stems from three contributors: the weights, the optimizer states, and the intermediate activations. However, existing works still require considerable memory as none of them can collectively mitigate the memory footprint from all three sources. In this paper, we present Quantized Side Tuing (QST), a novel memory efficient and fast tuning framework. QST operates through a dual-stage process: first, QST quantizes the LLM into 4-bit to reduce the memory footprint of the weights in LLM; then QST introduces a side network separated from the LLM, which utilizes the hidden states of the LLM to make task-specific predictions. Using a separate side network can avoid the extensive computational costs of backpropagation through the LLM, thus saving the memory footprint of the intermediate activations. Furthermore, QST leverages several low-rank adaptors and gradient-free downsample modules to significantly reduce the trainable parameters, so as to save the memory footprint of the optimizer states. Experiments show that QST can reduce the total memory footprint by up to 2.3 $\times$ and speed up the finetuning process by up to 2.7 $\times$ while achieving competent performance compared with the state-of-the-art. When it comes to full finetuning, QST can reduce the total memory footprint up to $7 \times$.

Paper Type: long

Research Area: Machine Learning for NLP

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.

0 Replies

Loading