Why Variance Reduction Hurts Noisy Zeroth-Order Hard-Thresholding?

Haoran Fang; Jiayang Zhong; Xinzhe Yuan; Bin Gu

Why Variance Reduction Hurts Noisy Zeroth-Order Hard-Thresholding?

Haoran Fang, Jiayang Zhong, Xinzhe Yuan, Bin Gu

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: hard-thresholding, zeroth-order, noisy optimization, variance reduction

Abstract: The hard-thresholding gradient descent approach is a primary method for solving $\ell_0$-constrained optimization problems to achieve sparsity. In the black-box setting, where only function outputs are accessible, recent work has introduced stochastic and variance-reduced zeroth-order hard-thresholding algorithms to establish both algorithmic and theoretical feasibility, specifically addressing the inherent conflict between zeroth-order and hard-thresholding. However, in practice, function outputs often contain noise, which exacerbates their conflict and undermines the robustness of these algorithms' guarantees. In this work, we investigate the performance for noisy zeroth-order hard-thresholding algorithms, providing convergence analysis for its stochastic version. Furthermore, we theoretically demonstrate the zeroth-order hard-thresholding variance reduction algorithms leveraging historical gradients inherently lowers the tolerable noise upper bound. Contrary to usual presumptions, our findings reveal that variance reduction techniques fail to enhance performance in this setting and even lead to worse feasibility compared to simpler methods without such techniques. These theoretical insights are validated through experiments on a sparse regression problem, black-box adversarial attacks, and biological gene expression.

Supplementary Material: zip

Primary Area: optimization

Submission Number: 14920

Loading