A Multilevel Low-Rank Newton Method with Super-linear Convergence Rate and its Application to Non-convex Problems
Abstract: Second-order methods can address the shortcomings of first-order methods for the optimization of large-scale machine learning models.
However, second-order methods have significantly higher computational costs associated with the computation of second-order information. Subspace methods that are based on randomization have addressed some of these computational costs as they compute search directions in lower dimensions. Even though super-linear convergence rates have been empirically observed, it has not been possible to rigorously show that these variants of second-order methods can indeed achieve such fast rates.
Also, it is not clear whether subspace methods are efficient for non-convex settings.
To address these shortcomings, we develop a link between multigrid optimization methods and low-rank Newton methods that enables us to prove the super-linear rates of stochastic low-rank Newton methods rigorously. Our method does not require any computations in the original model dimension. We further propose a truncated version of the method that is capable of solving high-dimensional non-convex problems. Preliminary numerical experiments show that our method has a better escape rate from saddle points compared to the state-of-the-art first-order methods and thus returns lower training errors.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Stephen_Becker1
Submission Number: 6223
Loading