Dual Gauss-Newton Directions for Deep Learning

Dual Gauss-Newton Directions for Deep Learning

TMLR Paper2869 Authors

14 Jun 2024 (modified: 28 Jun 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Gauss-Newton (a.k.a. prox-linear) directions can be computed by solving an optimization subproblem that trade-offs between a partial linearization of the objective function and a proximity term. In this paper, we study the possibility to leverage the convexity of this subproblem in order to instead solve the corresponding dual. As we show, the dual can be advantageous when the number of network outputs is smaller than the number of network parameters. We propose a conjugate gradient algorithm to solve the dual, that integrates seamlessly with autodiff through the use of linear operators and handles dual constraints. We prove that this algorithm produces descent directions, when run for any number of steps. Finally, we study empirically the advantages and current limitations of our approach compared to various popular deep learning solvers.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Lorenzo_Orecchia1

Submission Number: 2869

Loading