Cubic regularized subspace Newton for non-convex optimization
TL;DR: We present global convergence results for optimizing smooth non-convex problems that cover various coordinate sampling schemes. This unified framework includes not only the cubic subspace Newton method, but also inexact Newton methods.
Abstract: This paper addresses the optimization problem of minimizing non-convex continuous functions,
a problem highly relevant in high-dimensional machine learning scenarios, particularly those involving over-parameterization. We analyze a randomized coordinate second-order method named SSCN, which can be interpreted as applying the cubic regularization of Newton's method in random subspaces. This approach effectively reduces the computational complexity associated with utilizing second-order information, making it applicable in higher-dimensional scenarios.
Theoretically, we establish strong global convergence guarantees for non-convex functions to a stationary point, with interpolating rates for arbitrary subspace sizes and
allowing inexact curvature estimation, starting from an arbitrary initialization.
When increasing the subspace size, our complexity matches the $\mathcal{O}(\epsilon^{-3/2})$
rate of the full Newton's method with cubic regularization.
Additionally, we propose an adaptive sampling scheme ensuring the exact convergence rate of $\mathcal{O}(\epsilon^{-3/2}, \epsilon^{-3})$ to a second-order stationary point, without requiring to sample all coordinates.
Experimental results demonstrate substantial speed-ups achieved by SSCN
compared to conventional first-order methods and other second-order subspace methods.
Submission Number: 288
Loading