Refining Adaptive Zeroth-Order Optimization at Ease

Yao Shu; Qixin Zhang; Kun He; Zhongxiang Dai

Refining Adaptive Zeroth-Order Optimization at Ease

Yao Shu, Qixin Zhang, Kun He, Zhongxiang Dai

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce R-AdaZO, a novel approach that improves the convergence of existing adaptive ZO methods by effectively leveraging moment information.

Abstract: Recently, zeroth-order (ZO) optimization plays an essential role in scenarios where gradient information is inaccessible or unaffordable, such as black-box systems and resource-constrained environments. While existing adaptive methods such as ZO-AdaMM have shown promise, they are fundamentally limited by their underutilization of moment information during optimization, usually resulting in underperforming convergence. To overcome these limitations, this paper introduces *Refined Adaptive Zeroth-Order Optimization* (R-AdaZO). Specifically, we first show the untapped variance reduction effect of first moment estimate on ZO gradient estimation, which improves the accuracy and stability of ZO updates. We then refine the second moment estimate based on these variance-reduced gradient estimates to better capture the geometry of the optimization landscape, enabling a more effective scaling of ZO updates. We present rigorous theoretical analysis to show **_(a)_** *the first analysis* to the variance reduction of first moment estimate in ZO optimization, **_(b)_** *the improved second moment estimates* with a more accurate approximation of its variance-free ideal, **_(c)_** *the first variance-aware convergence framework* for adaptive ZO methods, which may be of independent interest, and **_(d)_** *the faster convergence* of R-AdaZO than existing baselines like ZO-AdaMM. Our extensive experiments, including synthetic problems, black-box adversarial attack, and memory-efficient fine-tuning of large language models (LLMs), further verify the superior convergence of R-AdaZO, indicating that R-AdaZO offers an improved solution for real-world ZO optimization challenges.

Lay Summary: **(1) Problem:** In many real-world AI applications, we can't directly access the "gradient" information that tells us how to improve a model. This is common in black-box systems (where we only see inputs and outputs) or on devices with limited computing power. Existing methods for optimizing these systems, while useful, often struggle to converge quickly because they don't fully leverage the information they *do* gather about the model's behavior. **(2) Solution:** We introduce R-AdaZO, a new optimization technique designed to overcome these limitations. Our key insight is that even without direct gradients, we can significantly improve the accuracy and stability of our updates by better utilizing "moment" information – specifically, by showing how the first moment estimate (a kind of average direction) can reduce noise in our estimations. We then use these more accurate estimates to refine how we scale our updates, allowing the optimization process to better adapt to the problem's unique characteristics. **(3) Impact:** R-AdaZO provides a more efficient and robust way to optimize complex AI systems when traditional gradient information is unavailable. This leads to faster and more reliable training for a wide range of applications, from making AI models more resilient to attacks to efficiently fine-tuning large language models on resource-constrained devices. Our work offers a significant step forward in tackling real-world optimization challenges in black-box and resource-limited environments.

Primary Area: Optimization->Zero-order and Black-box Optimization

Keywords: Adaptive Method, Zeroth-Order Optimization, Variance Reduction, Convergence

Submission Number: 8390

Loading