Prior-Driven Zeroth-Order Optimization for Scalable and Memory-Efficient LLM Fine-Tuning

ACL ARR 2025 May Submission618 Authors

14 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Fine-tuning large language models (LLMs) has demonstrated exceptional performance across a variety of natural language processing (NLP) tasks. However, the increasing scale of these models imposes significant memory overhead during backpropagation. While zeroth-order (ZO) optimization mitigates this issue by estimating gradients through forward passes and Gaussian sampling, its random sampling strategy introduces variance that scales linearly with the number of parameters, leading to slow convergence and suboptimal performance. We propose a novel gradient estimation framework that utilizes the computation of a guiding vector, which is derived from Gaussian sampling to direct perturbations for approximating gradients. By incorporating this prior knowledge into the perturbation process, our method significantly accelerates convergence compared to traditional ZO approaches. Additionally, we investigate whether a greedy strategy can yield similar enhancements in gradient estimation, providing further insights into the optimization process. Theoretical analysis indicates that the proposed gradient estimator achieves a more substantial alignment with the true gradient direction, thereby improving optimization efficiency. Comprehensive experiments conducted across LLMs of varying scales and architectures demonstrate that our method could integrates seamlessly into diverse optimization frameworks, delivering faster convergence and substantial performance improvements compared to existing methods.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Memory-efficient fine-tuning, LLMs, Zeroth-Order optimization
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Keywords: Zeroth-order optimization, LLM fine-tuning, Memory-efficient
Submission Number: 618
Loading