Prior-Driven Zeroth-Order Optimization for Scalable and Memory-Efficient LLM Fine-Tuning

Prior-Driven Zeroth-Order Optimization for Scalable and Memory-Efficient LLM Fine-Tuning

ACL ARR 2025 May Submission618 Authors

14 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Fine-tuning large language models (LLMs) has demonstrated exceptional performance across a variety of natural language processing (NLP) tasks. However, the increasing scale of these models imposes significant memory overhead during backpropagation. While zeroth-order (ZO) optimization mitigates this issue by estimating gradients through forward passes and Gaussian sampling, its random sampling strategy introduces variance that scales linearly with the number of parameters, leading to slow convergence and suboptimal performance. We propose a novel gradient estimation framework that utilizes the computation of a guiding vector, which is derived from Gaussian sampling to direct perturbations for approximating gradients. By incorporating this prior knowledge into the perturbation process, our method significantly accelerates convergence compared to traditional ZO approaches. Additionally, we investigate whether a greedy strategy can yield similar enhancements in gradient estimation, providing further insights into the optimization process. Theoretical analysis indicates that the proposed gradient estimator achieves a more substantial alignment with the true gradient direction, thereby improving optimization efficiency. Comprehensive experiments conducted across LLMs of varying scales and architectures demonstrate that our method could integrates seamlessly into diverse optimization frameworks, delivering faster convergence and substantial performance improvements compared to existing methods.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: Memory-efficient fine-tuning, LLMs, Zeroth-Order optimization

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Keywords: Zeroth-order optimization, LLM fine-tuning, Memory-efficient

Submission Number: 618

Loading