Abstract: Stochastic bilevel optimization (SBO) has been in- tegrated into many machine learning paradigms recently including hyperparameter optimization, meta learning, reinforcement learning, etc. Along with the wide range of applications, there have been abundant studies on concerning the comput- ing behaviors of SBO. However, the generaliza- tion guarantees of SBO methods are far less un- derstood from the lens of statistical learning the- ory. In this paper, we provide a systematical generalization analysis of the first-order gradient- based bilevel optimization methods. Firstly, we establish the quantitative connections between the on-average argument stability and the generaliza- tion gap of SBO methods. Then, we derive the upper bounds of on-average argument stabil- ity for single timescale stochastic gradient de- scent (SGD) and two timescale SGD, where three settings (nonconvex-nonconvex (NC-NC), convex- convex (C-C) and strongly-convex-strongly-convex (SC-SC)) are considered respectively. Experimen- tal analysis validates our theoretical findings. Com- pared with the previous algorithmic stability analy- sis, our results do not require the re-initialization of the inner-level parameters before each iteration and are suited for more general objective functions.
Loading