Optimizer Selection Based On Function-Proxy Proximity

04 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Optimization, artificial intelligence, descent rate, noncovex, bound, optimizer, second order, newton, arc, regret, proxy, meta-optimizer, meta-algorithm, optimizer-combination
TL;DR: Modern optimizers neglect the loss function Hessian shift rate, but this significantly affects algorithm performance. Based on this information, we develop a quality metric for optimizers and a meta-algorithm for dynamic algorithm selection.
Abstract: Many machine learning problems involve the challenging task of training a model to fit the training data; this task is especially challenging for nonconvex problems. Many model training algorithms have been proposed, but it is often difficult to determine which algorithm is best suited to a given machine learning problem. To contend with this challenge, we study the effect of loss function curvature shift on optimizers' proxies of the loss function, and obtain a bound on the rate at which a very large family of optimizers (including all of the most prevalent ones) descend towards the loss function's minimum, while only making relatively weak assumptions on the loss function. Uniquely, our bound is tight even in parameter subspaces in which the loss function is concave, which have been shown to bear potential for fast descent while being neglected by existing convergence rate bounds. We demonstrate the applicability of our bound by developing a meta-algorithm for optimizer selection based on it, and validate our meta-algorithm experimentally.
Primary Area: optimization
Submission Number: 2170
Loading