Abstract: Memory-efficient learning is crucial for reducing GPU consumption and enabling scalable training of large language models. Low-rank adaptation has proven effective for fine-tuning by injecting low-rank matrices into frozen pre-trained weights. However, these methods often degrade to full-rank training due to limited expressiveness and disrupted optimization dynamics. Conversely, projecting gradient updates within a low-rank subspace improves both training performance while simultaneously decreasing memory overhead. In this paper, we propose \textbf{Lotus}, a method that speeds up gradient projection via randomized SVD and further reduces memory cost. In addition, we propose an \textbf{adaptive subspace switching strategy} guided by the average displacement of the unit gradient, which enables dynamic subspace updates for improved convergence performance. Experimental results demonstrate that Lotus is currently \textbf{the most efficient method}, surpassing full-rank training in pre-training LLaMA-type models on the C4 dataset, as well as fine-tuning across multiple tasks. Our code will soon be available.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP, Large Language Model, Pre-training, Fine-tuning
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 3748
Loading