Ellipsoidal Trust Region Methods for Neural Network Training

Leonard Adolphs; Jonas Kohler; Aurelien Lucchi

Ellipsoidal Trust Region Methods for Neural Network Training

Leonard Adolphs, Jonas Kohler, Aurelien Lucchi

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We prepose a generalization of adaptive gradient methods to second-order algorithms.

Abstract: We investigate the use of ellipsoidal trust region constraints for second-order optimization of neural networks. This approach can be seen as a higher-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we show that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for provable convergence of second-order trust region methods with standard worst-case complexities. Furthermore, we run experiments across different neural architectures and datasets to find that the ellipsoidal constraints constantly outperform their spherical counterpart both in terms of number of backpropagations and asymptotic loss value. Finally, we find comparable performance to state-of-the-art first-order methods in terms of backpropagations, but further advances in hardware are needed to render Newton methods competitive in terms of time.

Code: https://www.dropbox.com/sh/cs8cokvhirjfit1/AAA-NiyMXGCrZsJPBFAXbAHUa?dl=0

Keywords: non-convex, optimization, neural networks, trust-region

Original Pdf: pdf

7 Replies

Loading