A Unified Convergence Theory for Large Language Model Efficient Fine-tuning

Zhanhong Jiang; Nastaran Saadati; Aditya Balu; Minh Pham; Joshua Russell Waite; Nasla Saleem; Chinmay Hegde; Soumik Sarkar

A Unified Convergence Theory for Large Language Model Efficient Fine-tuning

Zhanhong Jiang, Nastaran Saadati, Aditya Balu, Minh Pham, Joshua Russell Waite, Nasla Saleem, Chinmay Hegde, Soumik Sarkar

Published: 10 Oct 2024, Last Modified: 07 Dec 2024NeurIPS 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Convergence, Fine-tuning, Low rank adaptation, Representation fine-tuning, Optimization

TL;DR: We establish the connection between LoRA and ReFT and then unify them into a meta-algorithm, dubbed MeFT, and analyze its convergence rate.

Abstract: Parameter efficient fine-tuning (PEFT) has gained considerable attention by manipulating model parameters, and low-rank adaptation (LoRA) is deemed the state-of-the-art technique in PEFT. Though LoRA has witnessed its great successes in numerous fields, there still exists a theoretical gap on how it enables convergence. More recently, representation fine-tuning (ReFT) was developed by turning the fine-tuning to hidden representations of a model that encode significant semantic information, seemingly yielding the better performance than LoRA. In this work, we first establish the connection between LoRA and ReFT and then unify them into a meta-algorithm, dubbed model efficient fine-tuning (MeFT). MeFT not only provably shows the best available convergence rate coinciding with that in existing algorithms, but also theoretically reveals the relationship between the low rank and the convergence error. Our analysis facilitates the theoretical understanding of how low-rank decomposition fine-tuning techniques drive LLMs and offers useful insights for more efficient future algorithm design.

Submission Number: 60

Loading