TOU: Truncated-factorized reduction for an efficient-parameter model fine-tuning

Phuong Thi-Mai Nguyen; Minh-Son Dao; Koji Zettsu

TOU: Truncated-factorized reduction for an efficient-parameter model fine-tuning

Phuong Thi-Mai Nguyen, Minh-Son Dao, Koji Zettsu

Published: 10 Oct 2024, Last Modified: 01 Nov 2024FITML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: full weight fine-tuning, low-rank factorization, low-rank adaptation, truncated singular vector decomposition

TL;DR: TOU for Full Weight Fine-Tuning Model

Abstract: The fine-tuning of large-scale pre-trained models represents an effective approach to the transfer of knowledge to new tasks. Nevertheless, this approach typically necessitates the updating of all model parameters, which can result in considerable computational and memory costs. We put forth a methodology, designated as TOU, which employs truncated SVD to decompose weight matrices for comprehensive model fine-tuning. The objective of this method is to retain the benefits of full fine-tuning while reducing the computational and memory costs. Rather than updating the full weight matrices directly, weight matrices are factorized into low-rank components using truncated SVD, freezing one of two factored matrices, thereby enabling the efficient adaptation of the entire model. This significantly reduces the number of trainable parameters, leading to faster training and reduced memory usage. After fine-tuning, TOU accepts reconstructing to recover the structure of original model without any loss of performance. TOU utilises low-rank factorization of a reshaped and reorganised weight matrix to create space-efficient and expressive linear layers. Experiments on Vision Transformer models show that our method achieves a 70% reduction in trainable parameters while maintaining (accuracy drops < 1%), reduces 65% in term of training time, and 27% in term of memory usage comparable performance to full weight fine-tuning. Furthermore, TOU yields better performance than LoRA in terms of accuracy, training speed, and memory usage when setting the same target fine-tuned layers.

Submission Number: 63

Loading