How Low Can LoRA Go: System-Level Throughput, Energy, and Model Quality Tradeoffs when Fine-Tuning Adapters
Presentation: In-Person
Keywords: Low Rank Adaptation; Energy; Performance Modeling; Fine-tuning; Parameter Efficient Fine-Tuning; LLM; Extractive Question Answer
Presenter Full Name: Connor Espenshade
TL;DR: Investigation of rank and system/model performance finding lower ranks offer equal quality at a fraction of the memory for the same performance
Presenter Email: cje2136@columbia.edu
Abstract: As models scale beyond trillions of parameters,
extending their functionality is increasingly achieved through
fine-tuning existing base models. However, fine-tuning all pa-
rameters remains computationally expensive. Recent techniques
such as Low-Rank Adaptation (LoRA) have been developed to
reduce the number of trainable parameters. LoRA adapters have
gained widespread adoption, but their effects on GPU system
metrics, such as throughput and energy efficiency, are not yet
well understood.
In this study, we examine these system-level metrics as a func-
tion of the LoRA adapter rank. Our findings show that reducing
the rank of LoRA adapters does not lead to a significant drop
in model quality, while simultaneously improving throughput,
energy efficiency, and memory usage by up to 2.7x. Further, we
find that the presence of a LoRA adapter, rather than its rank
size, can greatly improve model quality compared to a zero-
shot inference base model. This makes smaller LoRA adapters
a compelling choice from both a system and a model quality
perspective.
Presenter Bio: Connor Espenshade is a Computer Engineering senior at Columbia University. He has 6 publications ranging from AI to biology and astrophysics. His work in computer architecture research focuses on analysis and optimization of systems performance for time and energy.
Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
YouTube Link: https://youtu.be/KOYBOU8VOpE
YouTube Link Poster: N/A
Dataset Release: I certify that all co-authors commit to release the dataset and necessary scripts to reproduce the presented results.
Google Slides: https://docs.google.com/presentation/d/1nckIF2JfDSVDkLO_xIQHUdPbenKqApk7qJPfUZE7vhU/edit?usp=sharing
Poster: Yes
Workshop Registration: Yes, the presenter has registered for the workshop.
YouTube Link Short: [to come]
Submission Number: 20
Loading