Abstract: First-order optimization methods often perform poorly on non-Lipschitz smooth and ill-conditioned problems. Recent work introduced the dual preconditioned gradient descent algorithm, which applies a nonlinear preconditioning to the gradient map to improve performance on convex functions satisfying relative smoothness -- a generalized version of Lipschitz gradient smoothness. In this paper, we significantly extend this prior work by providing a convergence analysis of this algorithm for non-Lipschitz smooth nonconvex problems. To this end, we exploit recent connections with generalized versions of convexity and smoothness, referred to as anisotropic convexity/smoothness, which guarantee convergence to a first-order stationary point. Further, we show that some recently proposed preconditioners based on power functions or relativistic dynamics are well-suited for a broad class of objectives. Our experiments demonstrate improved performance using these preconditioners on a variety of non-Lipschitz smooth, nonconvex optimization objectives, including large-scale deep learning tasks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Optimization (eg, convex and non-convex optimization)
Supplementary Material: zip
4 Replies
Loading