Understanding Learning Dynamics of Zeroth-Order Optimization

Zhe Li; Bicheng Ying; Zidong Liu; Haibo Yang

Understanding Learning Dynamics of Zeroth-Order Optimization

Zhe Li, Bicheng Ying, Zidong Liu, Haibo Yang

Published: 02 Mar 2026, Last Modified: 19 Mar 2026Sci4DL 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Learning Dynamics, Zeroth-Order Optimization, Kernel

Abstract: We derive the one-step learning dynamics of zeroth-order (ZO) SGD, where the empirical Neural Tangent Kernel (eNTK) naturally emerges as the key term governing the learning behavior. Inspection of the eNTK produced by ZO-SGD reveals that each element corresponds to the inner product of neural tangent vectors projected onto a random low-dimensional subspace. Thus, by invoking the Johnson-Lindenstrauss Lemma, our analysis shows that the fidelity of the ZO eNTK is governed primarily by the number of perturbations. Crucially, the approximation error depends on the model output size rather than the massive parameter dimension. This dimension-free property provides a theoretical justification for the scalability of ZO methods to LLMs finetuning tasks. We believe that this kernel-based framework offers a novel perspective for understanding ZO methods within the context of learning dynamics

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Style Files: I have used the style files.

Submission Number: 7

Loading