Abstract: Gauss-Newton (a.k.a. prox-linear) directions can be computed by solving an
optimization subproblem that trade-offs between a partial linearization of the
objective function and a proximity term. In this paper, we study the possibility
to leverage the convexity of this subproblem in order to instead solve the
corresponding dual. As we show, the dual can be advantageous when the number of
network outputs is smaller than the number of network parameters. We propose a
conjugate gradient algorithm to solve the dual, that integrates seamlessly with
autodiff through the use of linear operators and handles dual constraints. We
prove that this algorithm produces descent directions, when run for any number
of steps. Finally, we study empirically the advantages and current limitations
of our approach compared to various popular deep learning solvers.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Lorenzo_Orecchia1
Submission Number: 2869
Loading