Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points

TMLR Paper3314 Authors

09 Sept 2024 (modified: 17 Sept 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Applying Gradient Descent with fixed Momentum (GDM) and a fixed step size to minimize a (possibly nonconvex) quadratic function is equivalent to running the Power Method with fixed Momentum (PMM) on the gradients. Thus, valuable eigen-information is available via GDM. A new algorithm called Gradient Descent with a Kick (GD-Kick) is presented, which exploits the `free' eigen-information available from the GDM-PMM connection, and occasionally takes a locally adaptive, long step. Numerical experiments show the advantages of GD-Kick compared with vanilla GD, particularly near saddle points.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Lorenzo_Orecchia1
Submission Number: 3314
Loading