Keywords: Optimization, Automatic Differentiation
TL;DR: This paper introduces a new second-order hyperplane search that uses forward-mode automatic differentiation. We include empirical and theoretical results.
Abstract: This paper introduces a second-order hyperplane search, a novel optimization step that generalizes a second-order line search from a line to a $K$-dimensional hyperplane. This, combined with the forward-mode stochastic gradient method, yields a second-order optimization algorithm that consists of forward passes only, completely avoiding the storage overhead of backpropagation. Unlike recent work that relies on directional derivatives (or Jacobian--Vector Products, JVPs), we use hyper-dual numbers to jointly evaluate both directional derivatives and their second-order quadratic terms. As a result, we introduce forward-mode weight perturbation with Hessian information for $K$-dimensional hyperplane search (FoMoH-$K$D). We derive the convergence properties of FoMoH-$K$D and show how it generalizes to Newton's method for $K=D$. We also compare its convergence rate to forward gradient descent (FGD) and show FoMoH-$K$D has an exponential convergence rate compared to FGD's sub-linear convergence for positive definite quadratic functions. We illustrate the utility of this extension and how it might be used to overcome some of the recent challenges of optimizing machine learning models without backpropagation.
Submission Number: 12
Loading