Non-Uniform Smoothness for Gradient Descent

Published: 12 Feb 2024, Last Modified: 12 Feb 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: The analysis of gradient descent-type methods typically relies on the Lipschitz continuity of the objective gradient. This generally requires an expensive hyperparameter tuning process to appropriately calibrate a stepsize for a given problem. In this work we introduce a local first-order smoothness oracle (LFSO) which generalizes the Lipschitz continuous gradients smoothness condition and is applicable to any twice-differentiable function. We show that this oracle can encode all relevant problem information for tuning stepsizes for a suitably modified gradient descent method and give global and local convergence results. We also show that LFSOs in this modified first-order method can yield global linear convergence rates for non-strongly convex problems with extremely flat minima, and thus improve over the lower bound on rates achievable by general (accelerated) first-order methods.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: - More details given about computational cost of evaluating oracle (Section 2 with new Appendix A) - Clearer discussion of next steps required for more thorough numerical testing (intro and conclusion)
Assigned Action Editor: ~Anastasios_Kyrillidis2
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1829