Lower Bounds of Uniform Stability in Gradient-Based Bilevel Algorithms for Hyperparameter Optimization

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Uniform stability, Lower bound, Hyperparameter optimization, Bilevel programming
TL;DR: We establish uniform stability lower bounds for representative gradient-based bilevel hyperparameter optimization algorithms.
Abstract: Gradient-based bilevel programming leverages unrolling differentiation (UD) or implicit function theorem (IFT) to solve hyperparameter optimization (HO) problems, and is proven effective and scalable in practice. To understand their generalization behavior, existing works establish upper bounds on the uniform stability of these algorithms, while their tightness is still unclear. To this end, this paper attempts to establish stability lower bounds for UD-based and IFT-based algorithms. A central technical challenge arises from the dependency of each outer-level update on the concurrent stage of inner optimization in bilevel programming. To address this problem, we introduce lower-bounded expansion properties to characterize the instability in update rules which can serve as general tools for lower-bound analysis. These properties guarantee the hyperparameter divergence at the outer level and the Lipschitz constant of inner output at the inner level in the context of HO. Guided by these insights, we construct a quadratic example that yields tight lower bounds for the UD-based algorithm and meaningful bounds for a representative IFT-based algorithm. Our tight result indicates that uniform stability has reached its limit in stability analysis for the UD-based algorithm.
Supplementary Material: zip
Primary Area: Learning theory
Submission Number: 9487
Loading