CompSGD: Robust Comparison-Based Approach for Zeroth-Order Optimization under $(L_0, L_1)$-Smoothness and Heavy-Tailed Noise

ICLR 2026 Conference Submission21160 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Heavy-tailed noise, generalized smoothness, $(L_0, L_1)$-smoothness, high probability bound, zeroth-order optimization, comparison oracle
Abstract: In modern non-convex optimization, more and more attention is drawn to zeroth-order problems where the only available information is which set of model parameters is better, without quantitative characteristics. The data in these problems can be extremely noisy, and the models themselves are so complex that the standard smoothness assumption fails to describe them. Motivated by these challenges, we propose new zeroth-order methods to deal with generalized $(L_0,L_1)$-smoothness and severe heavy-tailed noise with bounded $\kappa$-th moment. Using only comparisons of function values at two different points, our $\texttt{MajorityVote-CompSGD}$ method achieves the first-known high probability bound $\tilde{O}\left(\frac{\Delta \sigma^2 d^{9/2}}{\kappa^2}\left(\frac{L_0^3}{\varepsilon^{6}} + \frac{L_1^3}{\varepsilon^{3}}\right)\right), \kappa \in (0,2]$ for number of comparisons under symmetric independent noise. If function values are available, our $\texttt{minibatch-CompSGD}$ can converge to the desired average gradient norm after $\tilde{O}\left(\Delta\sigma^\frac{\kappa}{\kappa - 1}(\frac{d^{3/2}L_0}{\varepsilon^2} + \frac{d^{3/2}L_1}{\varepsilon})^\frac{2\kappa - 1}{\kappa - 1}\right), \kappa \in (0,2]$ function evaluations. In addition, we provide convergence guarantees for Lipschitz noise, parameter-free tunings and in expectation bounds with milder $d$ dependence.
Primary Area: optimization
Submission Number: 21160
Loading