Training Neural Networks from Scratch with Parallel Low-Rank Adapters

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Low-rank adapters
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Using LoRA to train model from scratch
Abstract: The scalability of deep learning applications is fundamentally constrained by compute, memory, and communication. While low-rank adaptation (LoRA) has reduced these costs for model fine-tuning, its application to model pre-training remain largely unexplored. This paper examines the extension of LoRA to model pre-training, identifying the constraints and limitations inherent to standard LoRA in the context of pre-training. We introduce LoRA-the-Explorer (LTE), a novel bi-level optimization algorithm, to facilitate parallel training of multiple low-rank heads across compute nodes, minimizing the necessity for frequent synchronization. Our methodology involves rigorous experimentation on vision transformers using ImageNet100, demonstrating that LTE is competitive with standard distributed training methodologies. Initial scalability tests on ImageNet1k show that LTE can match standard training performance by leveraging more training iterations.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2082
Loading