Abstract: We analyze nonlinearly preconditioned gradient methods for solving smooth minimization problems. We introduce a generalized smoothness property, based on the notion of abstract convexity, that is broader than Lipschitz smoothness and provide sufficient first- and second-order conditions. Notably, our framework encapsulates algorithms associated with the gradient clipping method and brings out novel insights for the class of $(L_0,L_1)$-smooth functions that has received widespread interest recently, thus allowing us to extend beyond already established methods. We investigate the convergence of the proposed method in both the convex and nonconvex setting.
Lay Summary: Gradient descent (GD) is one of the core methods for training models in modern machine learning. Nevertheless, especially in cases where the cost function is not "smooth" enough gradient descent can become inefficient, requiring very small steps in order to find a solution.
Our research looks at a smarter way to apply GD, by reshaping the path taken by the algorithm using what's called nonlinear preconditioning.
To do this, we consider a different way of thinking about smoothness that goes beyond the standard definitions
used in the optimization literature. This allows us to cover a broader class of problems, including some that have recently attracted attention
for being hard to optimize but important in practice. We also show how the proposed framework includes popular techniques like "gradient clipping" and other similar methods, and extends them to new scenarios.
Primary Area: Optimization
Keywords: nonconvex optimization, generalized smoothness, first-order methods
Submission Number: 7377
Loading