Keywords: neural networks, Hessian, learning rate, projections, optimization
TL;DR: Cheap computation of tensors related to the higher-order derivatives of the loss, and application to second-order optimization of neural networks.
Abstract: When training large models, such as neural networks,
the full derivatives of order 2 and beyond are usually inaccessible,
due to their computational cost.
This is why, among the second-order optimization methods, it is very common
to bypass the computation of the Hessian by using
first-order information, such as the gradient of the parameters (e.g., quasi-Newton methods)
or the activations (e.g., K-FAC).
In this paper, we focus on the exact and explicit computation
of projections of the Hessian and higher-order derivatives on
well-chosen subspaces, which are relevant for optimization.
Namely, for a given partition of the set of parameters,
it is possible to compute tensors which can be seen as
"higher-order derivatives according to the partition",
at a reasonable cost as long as the number of subsets of
the partition remains small.
Then, we propose an optimization method exploiting
these tensors at order 2 and 3 with several interesting properties, including:
it outputs a learning rate per subset of parameters, which can
be used for hyperparameter tuning;
it takes into account long-range interactions
between the layers of the trained neural network,
which is usually not the case in similar methods (e.g., K-FAC);
the trajectory of the optimization is invariant under
affine layer-wise reparameterization.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3114
Loading