Variance Reduction of Stochastic Hypergradient Estimation by Mixed Fixed-Point Iteration

Naoyuki Terashita; Satoshi Hara

Variance Reduction of Stochastic Hypergradient Estimation by Mixed Fixed-Point Iteration

Naoyuki Terashita, Satoshi Hara

Published: 28 Jul 2025, Last Modified: 28 Jul 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Hypergradient represents how the hyperparameter of an optimization problem (or inner-problem) changes an outer-cost through the optimized inner-parameter, and it takes a crucial role in hyperparameter optimization, meta learning, and data influence estimation. This paper studies hypergradient computation involving a stochastic inner-problem, a typical machine learning setting where the empirical loss is estimated by minibatches. Stochastic hypergradient estimation requires estimating products of Jacobian matrices of the inner iteration. Current methods struggle with large estimation variance because they depend on a specific sequence of Jacobian samples to estimate this product. This paper overcomes this problem by \emph{mixing} two different stochastic hypergradient estimation methods that use distinct sequences of Jacobian samples. Furthermore, we show that the proposed method enables almost sure convergence to the true hypergradient through the stochastic Krasnosel'ski\u{\i}-Mann iteration. Theoretical analysis demonstrates that, compared to existing approaches, our method achieves lower asymptotic variance bounds while maintaining comparable computational complexity. Empirical evaluations on synthetic and real-world tasks verify our theoretical results and superior variance reduction over existing methods.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/hitachi-rd-cv/mixed-fp

Supplementary Material: zip

Assigned Action Editor: ~Samuel_Vaiter1

Submission Number: 4546

Loading